How to evaluate a classifier in scikit-learn

  Рет қаралды 151,277

Data School

Data School

8 жыл бұрын

In this video, you'll learn how to properly evaluate a classification model using a variety of common tools and metrics, as well as how to adjust the performance of a classifier to best match your business objectives. I'll start by demonstrating the weaknesses of classification accuracy as an evaluation metric. I'll then discuss the confusion matrix, the ROC curve and AUC, and metrics such as sensitivity, specificity, and precision. By the end of the video, you will have a solid foundation for intelligently evaluating your own classification model.
Download the notebook: github.com/justmarkham/scikit...
== CONFUSION MATRIX RESOURCES ==
Simple guide to confusion matrix terminology: www.dataschool.io/simple-guid...
Intuitive sensitivity and specificity: • EBM 01: Intuitive Sens...
The tradeoff between sensitivity and specificity: • The tradeoff between s...
How to calculate "expected value" from a confusion matrix: github.com/podopie/DAT18NYC/b...
Classification threshold graphic: media.amazonwebservices.com/b...
== ROC/AUC RESOURCES ==
ROC Curves and Area Under the Curve: • ROC Curves and Area Un...
ROC visualization: www.navan.name/roc/
ROC Curves: • ROC Curves
An introduction to ROC analysis: people.inf.elte.hu/kiss/13dwhd...
Comparing different feature sets: research.microsoft.com/pubs/20...
Comparing different classifiers: www.cse.ust.hk/nevinZhangGroup...
== OTHER RESOURCES ==
scikit-learn documentation on model evaluation: scikit-learn.org/stable/module...
Comparing model evaluation procedures and metrics: github.com/justmarkham/DAT8/b...
Counterfactual evaluation of machine learning models: • Michael Manapat: Count...
WANT TO GET BETTER AT MACHINE LEARNING? HERE ARE YOUR NEXT STEPS:
1) WATCH my scikit-learn video series:
• Machine learning in Py...
2) SUBSCRIBE for more videos:
kzbin.info?su...
3) JOIN "Data School Insiders" to access bonus content:
/ dataschool
4) ENROLL in my Machine Learning course:
www.dataschool.io/learn/
5) LET'S CONNECT!
- Newsletter: www.dataschool.io/subscribe/
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham

Пікірлер: 410
@dataschool
@dataschool 3 жыл бұрын
Having problems with the code? I just finished updating the notebooks to use *scikit-learn 0.23* and *Python 3.9* 🎉! You can download the updated notebooks here: github.com/justmarkham/scikit-learn-videos
@Jan-wg3kn
@Jan-wg3kn 4 жыл бұрын
Thank you so much!! Extremely clearly explained. For me, it has been among the most useful tutorials that I ever went through on KZbin.
@charlescoult
@charlescoult Жыл бұрын
I love how you frame a question and then go about answering it, step by step, along with the viewer. So many instructors simply venture forth with explaining something, assuming along the way that the student knows exactly where they are heading. Also, it helps to compartmentalize the learning for the student; minimizing confusion.
@dataschool
@dataschool Жыл бұрын
Thank you so much!
@carlomision9546
@carlomision9546 5 жыл бұрын
You explain the concepts really well and the examples you've provided really cement everything together. Thank you!
@dataschool
@dataschool 5 жыл бұрын
You're very welcome! Thanks for your kind words :)
@sandeepkrishnan8582
@sandeepkrishnan8582 6 жыл бұрын
It has been more than 2 years since you made this video but is still one of the best out there and incredibly helpful. The delivery style is something that makes it easier to follow. Thank you for a great tutorial Kevin..!
@dataschool
@dataschool 6 жыл бұрын
Thanks for your kind words! I really appreciate you taking the time to comment!
@dataschool
@dataschool 6 жыл бұрын
*Note:* This video was recorded using Python 2.7 and scikit-learn 0.16. Recently, I updated the code to use Python 3.6 and scikit-learn 0.19.1. You can download the updated code here: github.com/justmarkham/scikit-learn-videos
@callmeness
@callmeness 5 жыл бұрын
I am your loyal follower! It's really helpful
@feliciafryer3271
@feliciafryer3271 5 жыл бұрын
God Bless You! Thank-you!
@tahaanwar5224
@tahaanwar5224 6 жыл бұрын
This is hands down One of the best series I've ever watched. Your execution is flawless
@dataschool
@dataschool 6 жыл бұрын
Thank you so much for your kind comment! I really appreciate it :)
@glowish1993
@glowish1993 4 жыл бұрын
This is like the best corner of KZbin for practical machine learning code + explaination. Thank you so much!
@dataschool
@dataschool 4 жыл бұрын
Thanks very much for your kind words!
@adamyatripathi2743
@adamyatripathi2743 8 жыл бұрын
Did I tell you that you are simply awesome?! Your concepts and their delivery is simply DIVINE!! Thank you very much sir, but yeah you made me fall in love with python and data science! " Become a data scientist " - My new year resolution for 2016 !!
@dataschool
@dataschool 8 жыл бұрын
+Adamya Tripathi HA! Thanks for your kind comments! :)
@kareemjeiroudi1964
@kareemjeiroudi1964 5 жыл бұрын
Your 15-minutes video that explains the ROC/AUC is the best.
@dataschool
@dataschool 5 жыл бұрын
Thanks very much for your kind words!
@kmillanr
@kmillanr 5 жыл бұрын
I have to say that I am very grateful for the way you are giving these lessons. You are the master, thank you!
@dataschool
@dataschool 5 жыл бұрын
Thanks very much for your kind words!
@orkuntahiraran
@orkuntahiraran 3 жыл бұрын
Hi there. I just found your channel yesterday and subscribed immediately. Thanks for the information you provided here with a great accent and diction.
@dataschool
@dataschool 2 жыл бұрын
Thank you!
@parthbhardwaj8435
@parthbhardwaj8435 5 жыл бұрын
your teaching level is god level got crystal clear in just one go!
@dataschool
@dataschool 5 жыл бұрын
You are so kind, thank you so much! 😊
@spoown007
@spoown007 5 жыл бұрын
Thanks for all the video, it helped me re-discover ML through another language (Python) , and really enjoy it ! I still remember the time when I worked on ML with Java implementation tools, and a lot of computation done in R... Seems to be way easier now. I think I will give a try to kaggle now with all these new video and scikit library in python !! Great job Kevin !
@dataschool
@dataschool 5 жыл бұрын
Awesome! Great to hear!
@menon5t
@menon5t 5 жыл бұрын
Most relevant topics taught in very simple words and code snippets. Explanations are to the point and examples are perfect. Thanks for these videos.
@dataschool
@dataschool 5 жыл бұрын
You're very welcome!
@yuriycas1719
@yuriycas1719 7 жыл бұрын
one of the best tutorial series ever, thanks! would be great to see some deep learning / tensorflow / theano stuff if you're planning to do more tutorials=)
@dataschool
@dataschool 7 жыл бұрын
Thanks so much for your comment! I will definitely keep your suggestions in mind for future tutorials :)
@ittybittyinnovations5983
@ittybittyinnovations5983 5 жыл бұрын
Thank you very much on behalf of all the viewers of your video series.....stay blessed man
@dataschool
@dataschool 5 жыл бұрын
You're very welcome!
@terryliu3635
@terryliu3635 5 жыл бұрын
Thanks again! Now I have a much understanding of confusion matrix, ROC and AUC!
@dataschool
@dataschool 5 жыл бұрын
You're very welcome!
@elvykamunyokomanunebo1441
@elvykamunyokomanunebo1441 Жыл бұрын
This course is a master class: Well delivered content, at a digestible paste. Thank you very much. Regards Elvy
@dataschool
@dataschool Жыл бұрын
Thank you so much!
@shanxW16
@shanxW16 3 жыл бұрын
You're a legend buddy! I've sifted videos after videos for good explanation of confusion matrix. None came close to yours. Lots of people might have a PhD in the topic, but not all are good tutors.
@dataschool
@dataschool 3 жыл бұрын
Thanks very much for your kind words!
@rohitsaxena22
@rohitsaxena22 8 жыл бұрын
Stumble upon ur video while searching for evaluation metrics. Well explained! Kudos!!
@dataschool
@dataschool 8 жыл бұрын
Great, thanks!
@danielphill86
@danielphill86 4 жыл бұрын
Fantastic video, thank you. So clear and concise 👍
@nriezedichisom1676
@nriezedichisom1676 2 жыл бұрын
This video is just perfect and cleared most of the confusion that I have always had concerning this topic
@dataschool
@dataschool 2 жыл бұрын
Great to hear!
@sofuno863
@sofuno863 7 жыл бұрын
teaching_level='Master' print ('thank you ' , teaching_level)
@dataschool
@dataschool 7 жыл бұрын
HA! Thank you so much for your kind comment :)
@5Doum
@5Doum 6 жыл бұрын
am slightly triggered by the space after the word "thank you". Since you used a comma instead of a '+', it will show up as "thank you Master" (two spaces) instead of "thank you Master" (one space).
@sofuno863
@sofuno863 6 жыл бұрын
Thanks! it is confusing me a little since I've been using Java so much
@debasishhazra3222
@debasishhazra3222 7 жыл бұрын
Hi Kevin, Today only I have started looking into your videos on Machine Learning. So far, I have gone through three videos and it's awesome. Your teaching style is so very nice that it's really outstanding for beginners even. Carry ON...it's wonderful.
@dataschool
@dataschool 7 жыл бұрын
Excellent! Thanks so much for your comment and I'm so glad the videos are helpful to you!
@sauvikz
@sauvikz 7 жыл бұрын
Big fan of this particular video in the series. Thank you so much for doing this :) I know this is too much to ask but do you have plans to upload similar videos for SVMs and Decision Trees?
@dataschool
@dataschool 7 жыл бұрын
Thank so much! I spent a lot of time on this video :) Thanks also for your suggestion. I'll consider it for future videos! But no, I don't currently have plans to create videos on SVMs and decision trees.
@NaraAIApp
@NaraAIApp 4 жыл бұрын
Gonna share it to all of my freinds, great video!
@dataschool
@dataschool 4 жыл бұрын
Thanks!
@maxzhong9394
@maxzhong9394 8 жыл бұрын
go through all your great videos, and it really introduced me to ML using py and sklearn. thank you so much! hoping you could post more videos on feature extracting etc.
@dataschool
@dataschool 8 жыл бұрын
+Max Zhong You're very welcome, thanks for your comment! Regarding feature extraction, that tends to be domain-specific. The scikit-learn documentation has some good information: scikit-learn.org/stable/modules/feature_extraction.html As well, my course covers feature extraction from text: www.dataschool.io/learn/
@nferraz
@nferraz 7 жыл бұрын
Thank you so much for your effort in this channel and mainly in this video who i appreciate so much!
@dataschool
@dataschool 7 жыл бұрын
You're very welcome!
@mohammadsalehi7056
@mohammadsalehi7056 5 жыл бұрын
The best ML videos that I've ever seen. Thank you
@dataschool
@dataschool 4 жыл бұрын
Thank you so much! 🙏
@massivamiss5244
@massivamiss5244 4 жыл бұрын
please how we can use that for our model for exemple faster RCNN we can use this steps ???????
@chidanandamurthyp3824
@chidanandamurthyp3824 3 жыл бұрын
Wonderful teaching, granularity of explanation is exceptional. Thank you.
@dataschool
@dataschool 2 жыл бұрын
You're welcome!
@ackrite55
@ackrite55 7 жыл бұрын
Thank you!! I have enjoyed this series very much.
@dataschool
@dataschool 7 жыл бұрын
Great to hear! You're very welcome :)
@maternentihemuka682
@maternentihemuka682 6 жыл бұрын
Thank you very much Kevin, this helped me to understand how to build a predictive model
@dataschool
@dataschool 6 жыл бұрын
That's great to hear - congratulations!
@gtpone
@gtpone 7 жыл бұрын
Great series of videos. Thank you very much for all the effort! Would you consider demonstrating a complete, specific machine learning project step by step, to emphasize all these techniques?
@dataschool
@dataschool 7 жыл бұрын
Glad you like the videos! I do cover end-to-end machine learning projects in my online course: www.dataschool.io/learn/ If that course is not a good fit for you, feel free to subscribe to my email newsletter to hear about future courses: www.dataschool.io/subscribe/
@alexandrekolisnyk
@alexandrekolisnyk 4 жыл бұрын
My problem is similar to yours fraud example. I´va got low sensitivity to positive instances (low recall). So, as you said, I have to optmize the sensitivity when selecting different models. You helped me to guide my next steps...
@mohdarshad3948
@mohdarshad3948 7 жыл бұрын
Excelent, I was struggling to clear my doubts now its crystal clear
@dataschool
@dataschool 7 жыл бұрын
Great to hear!
@Nico.75
@Nico.75 4 жыл бұрын
This is an awesome video and the way you explain all the topics super clearly helped me a lot, big thumbs up!🙏 May I ask you about in which step you are varying thresholds (or other hyperparamters) in order to select the best threshold? Are you first doing feature selection with a constant default threshold and after you found a final feature set you do vary thresholds and evaluate the performances using ROC/AUC? I wonder if you used different thresholds first you may end with different feature selections? Thanks a lot in advance!!😊
@yashchoudhary5006
@yashchoudhary5006 8 жыл бұрын
That was a great lecture series . Thanks a lot for sharing your knowledge. I would love if you cover some examples(datasets) which have missing values in it and use it to compare different models.
@dataschool
@dataschool 8 жыл бұрын
Glad the scikit-learn series was helpful to you! Regarding missing values, scikit-learn doesn't accept data with missing values. Instead, you need to figure out how to deal with the missing values before training your model. Here's a video about how to do this using pandas: kzbin.info/www/bejne/nHSwo4KVi9-Ygpo
@Gaarv1911
@Gaarv1911 7 жыл бұрын
I searched quite a lot and only here found out how to change the classifier threshold, thanks!
@dataschool
@dataschool 7 жыл бұрын
Great to hear! I'm glad I was able to help.
@syedhasan773
@syedhasan773 5 жыл бұрын
Bruh... You are the man, YOU ARE THE MAN!!! False Positive = FALSELY PREDICTED POSITIVE. That's just fucking genius. I've been trying to understand these for the past 2 days man. And you did it in 10 secs. Damn good job ma man
@dataschool
@dataschool 5 жыл бұрын
HA! So glad to hear that I was helpful to you.... really appreciate your comment!
@petercourt
@petercourt 4 жыл бұрын
This was a seriously helpful series, thank you so much for going to the effort of making it :)
@Acemarcelo12
@Acemarcelo12 3 жыл бұрын
True, i am going thru comments i and i see that yours is like 4 months ago. i am having error in some part of the code due to the version of SckitLearn i am using. i really hope you can enlighten me. Thank you. Code; "from sklearn.preprocessing import binarize y_pred_class = binarize(y_pred_prob, 0.3)[0]" This is the warning error; "C:\Users\THINK\.conda\envs\tens\lib\site-packages\sklearn\utils\validation.py:70: FutureWarning: Pass threshold=0.3 as keyword args. From version 0.25 passing these as positional arguments will result in an error FutureWarning)" Error ; "conda\envs\tens\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator) 621 "Reshape your data either using array.reshape(-1, 1) if " 622 "your data has a single feature or array.reshape(1, -1) " --> 623 "if it contains a single sample.".format(array)) 624 625 # in the future np.flexible dtypes will be handled like object dtypes ValueError: Expected 2D array, got 1D array instead:" "Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample."
@Acemarcelo12
@Acemarcelo12 3 жыл бұрын
the time is 40:20
@petercourt
@petercourt 3 жыл бұрын
@@Acemarcelo12 I'm not sure on the details but basically it's as it says - you are giving it an array that is 1D and it should be 2D. Try the .reshape command.
@Acemarcelo12
@Acemarcelo12 3 жыл бұрын
@@petercourt thank you. it worked
@jaimegonzalezsuarez1566
@jaimegonzalezsuarez1566 7 жыл бұрын
you have the ability complex concept in a such simple way really nice explanation about this topic
@dataschool
@dataschool 6 жыл бұрын
I'm glad the video was helpful to you!
@puneethb6175
@puneethb6175 6 жыл бұрын
First of all, I can't thank you enough for what I have learned throughout these series. These series were instrumental in enabling me to get comfortable with approaching a machine learning problem. My question is, what would you suggest I do after this series? I already finished Andrew Ngs online course so I have a basic understanding of what goes on under the hood but lack practical experience as his entire course was done in Octave. If you have any tips that you could give me I would be grateful. Thank you for everything you have done and you have earned yourself a loyal subscriber :D
@dataschool
@dataschool 6 жыл бұрын
Awesome! That is so nice to hear, and I'm glad I could be of help to you! Regarding your question, steps 4 and 5 on this page might be helpful to you: www.dataschool.io/launch-your-data-science-career-with-python/ Good luck, and keep in touch!
@eqkang1
@eqkang1 5 жыл бұрын
Great series, and very good job!
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@rishabhgautam7387
@rishabhgautam7387 6 жыл бұрын
Awesome Tutorial videos. Your teaching style and jupyter notes are excellent. One small request, Can you please add a playlist on deep learning.
@dataschool
@dataschool 6 жыл бұрын
Thanks for your kind words, and also for your suggestion!
@xKratareJrx
@xKratareJrx 3 жыл бұрын
Thank you so much. I found this video because I have an imbalanced dataset for a binary classification. Now i understand the whole picture !
@dataschool
@dataschool 3 жыл бұрын
Great to hear! 🙌
5 жыл бұрын
Brilliant work as always, thanks a lot for making it!
@dataschool
@dataschool 5 жыл бұрын
You're very welcome! Thanks for your kind words!
@nastarankianersi104
@nastarankianersi104 3 жыл бұрын
Well structured, well explained. Absolutely organized my mind, although I think different metrics derived from confusion matrix are kinda confusing. Thank you so much!
@dataschool
@dataschool 3 жыл бұрын
You're very welcome!
@manoocgegr1364
@manoocgegr1364 5 жыл бұрын
Great Video Kevin. With your amazing teaching style I am sure you can easily make a 5 year old kid fully understand Einstein's theories. Two thumbs up I am wondering if you can do the same video for multi-class classification problem. I might be wrong but looks like evaluations and scoring are completely different. Any advise is highly appreciated. Could you please share me any example on a multi-class problem?
@dataschool
@dataschool 5 жыл бұрын
Thanks so much for your very kind words! Regarding multi-class problems, I discuss how to evaluate them in this video: kzbin.info/www/bejne/boDSmGqKja2pfLs Hope that helps!
@nehalele5372
@nehalele5372 6 жыл бұрын
Kevin, Thank you once again for the super awesome tutorials and for being such a wonderful teacher. Getting a little greedy here. Are you planning to come up with tutorials on more models- reinforcement learning, deep learning, etc.?
@dataschool
@dataschool 6 жыл бұрын
You're very welcome! Thanks for the suggestions, I'll consider them for the future!
@prosenjitbiswas3743
@prosenjitbiswas3743 6 жыл бұрын
Excellent explanation!! Heartfelt Thanks
@dataschool
@dataschool 6 жыл бұрын
You're very welcome!
@dhoomketu731
@dhoomketu731 5 жыл бұрын
THIS IS ONE HECK OF A GOOD TUTORIAL.
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@nureyna629
@nureyna629 5 жыл бұрын
You are my best teacher ever!
@dataschool
@dataschool 5 жыл бұрын
Thanks :)
@christianaguertin3335
@christianaguertin3335 6 жыл бұрын
you just saved my life with your videos
@dataschool
@dataschool 6 жыл бұрын
Wow!! Great to hear!
@genaugenaugenau
@genaugenaugenau 6 жыл бұрын
Superb series, superbly concluded :)
@dataschool
@dataschool 6 жыл бұрын
Thanks! :)
@mahfoudhabdo3760
@mahfoudhabdo3760 5 жыл бұрын
You Are the best
@dataschool
@dataschool 5 жыл бұрын
The next step is to take my machine learning course: www.dataschool.io/learn/
@mohdanaskhan7203
@mohdanaskhan7203 7 жыл бұрын
Hey Kevin, You have done a great job, your tutorials are very useful.Could you please make some videos with Tensorflow ? It would be very helpful.
@dataschool
@dataschool 7 жыл бұрын
Thanks for the suggestion - I'll consider it for the future!
@brendachirata2283
@brendachirata2283 5 жыл бұрын
Hey, you are great. Thank you, this was too helpful.
@dataschool
@dataschool 5 жыл бұрын
Thanks! You're great! :)
@ragasetty2427
@ragasetty2427 4 жыл бұрын
Excellent teaching bro!! Thank you so much...
@dataschool
@dataschool 4 жыл бұрын
Thanks!
@gezahagnnegash9740
@gezahagnnegash9740 2 жыл бұрын
Thanks for sharing, it's helpful!
@dataschool
@dataschool 2 жыл бұрын
Great to hear!
@taotaotan5671
@taotaotan5671 4 жыл бұрын
This guys is awesome, greatest tutorial!
@dataschool
@dataschool 4 жыл бұрын
Thank you!
@warrock-5489
@warrock-5489 7 жыл бұрын
First of all, thank you Kevin for this awesome scikit learn tutorial series. :D I'd like to ask if there are materials that you could recommend for multi-label classification problem on text? :)
@dataschool
@dataschool 7 жыл бұрын
You're very welcome! Glad the series is helpful to you :) I don't think I have any resources on multi-label classification... sorry! Please let me know if you come across one, however!
@brendensong8000
@brendensong8000 3 жыл бұрын
My brain is fried after this video... it was such a great class!!!
@dataschool
@dataschool 3 жыл бұрын
Great to hear!
@Nico.75
@Nico.75 3 жыл бұрын
What a great video, no, what a bunch of great series!! For me a s ML newbie it's such a great ressource for doing my first project! May I ask you a basic question about the hold out set? I think I used my hold out set (test set) too early, after an initially built decision tree that I wanted then to tune... But I read that you should use your test set NEVER until for the final evaluation after all the optimization is done. But I used it already after the first train/test split, so my algorithm has already seen all of the data. And it's a fix data set from kaggle. Can I just start over setting up new splitted data (train, validation test set)? Can I make a model losing it's memory? Thanks so much for a short feedback, would appreciate it!!
@tomasemilio
@tomasemilio 7 жыл бұрын
Hey man, best material out there, awesome, if you could expand a bit more on machine learning, probably deep learning in the future. I would be tremendously thankful.
@dataschool
@dataschool 7 жыл бұрын
Thanks so much for your very kind comment! If you want to learn about my future courses, please subscribe to the Data School newsletter: www.dataschool.io/subscribe/
@ianben1538
@ianben1538 7 жыл бұрын
Your video tutorial is very helpful. It will be great if you can do some videos about ensemble learning and deep learning with TensorFow. Anyway, it's so kind of you to share your knowledge with us~
@dataschool
@dataschool 7 жыл бұрын
Thanks for the suggestion! I will consider it for the future. Though I don't have videos about TensorFlow, I do cover ensemble learning in my online course, Machine Learning with Text in Python: www.dataschool.io/learn/
@MadhurjyaBora
@MadhurjyaBora 5 жыл бұрын
Your explanation is so good. Thank you so much. Also I was wondering if you could tell or show how to plot a figure of target and predicted values
@dataschool
@dataschool 5 жыл бұрын
I think you just need to put the target and predicted values in two columns in a DataFrame, and make a scatterplot of one column versus the other.
@abdelrhmanshokr7546
@abdelrhmanshokr7546 4 жыл бұрын
again you're one of the best thanks a lot
@venkateshshunmugham7048
@venkateshshunmugham7048 8 жыл бұрын
Please upload more videos soon.. Thank you for sharing your knowledge!!! If possible please upload few tutorials on feature engineering techniques using data sets..
@dataschool
@dataschool 8 жыл бұрын
+Venkatesh Shunmugham You're very welcome! And thanks for the topic suggestion, I will take that into consideration.
@dataschool
@dataschool 5 жыл бұрын
You might enjoy my recent blog post about feature engineering: www.dataschool.io/introduction-to-feature-engineering/
@LonglongGuitar
@LonglongGuitar 7 жыл бұрын
Finally I finish this playlist!! In Kaggle competition, I found 'feature engineering' is much more important than 'model selection'. Could you make a series of videos for feature selection?
@dataschool
@dataschool 7 жыл бұрын
Thanks for the suggestion! Yes, feature engineering is very important, and I'll consider teaching it in the future.
@dataschool
@dataschool 5 жыл бұрын
You might enjoy my recent blog post about feature engineering: www.dataschool.io/introduction-to-feature-engineering/
@MyLotem
@MyLotem 8 жыл бұрын
Thanks for all the lessons, they were very helpful, teaching both the tools and a way of thinking once coming to analyse data. I wanted to ask do you know any lectures about using the Sklearn for spatial statistics?
@dataschool
@dataschool 8 жыл бұрын
+Lotem Robins You're welcome! I am not familiar, however, with spatial statistics.
@ishwartolamatti4718
@ishwartolamatti4718 4 жыл бұрын
Thank you so much sir! I was searching for a video for model evaluation got this one. Please make model evaluation video for Regression and clustering as well, if you have already made those videos as well then please share me the link. Thanks again and """You are my Hero!""".
@dataschool
@dataschool 4 жыл бұрын
Glad you like the video, and thanks for your video suggestions!
@williamaiken4568
@williamaiken4568 8 жыл бұрын
Your video are all great! Thank you for all the thought and effort you have put into them. I tried working through all the examples in this video using a multiclass data set. It appears that you can't use 'roc_auc' as a scoring metric with multiclass data. Do you have any advice?
@dataschool
@dataschool 8 жыл бұрын
Great, thank you! Regarding your question, this page lists the available classification metrics that can be used for different cases: scikit-learn.org/stable/modules/model_evaluation.html#classification-metrics
@sanjaykrish8719
@sanjaykrish8719 6 жыл бұрын
Kevin, you are amazing. Thanks
@dataschool
@dataschool 6 жыл бұрын
Thanks for your kind words!
@raghavendras5331
@raghavendras5331 5 жыл бұрын
one to word say about video: Simply superb. Please can you elaborate output of the values from sklearn.cross_validation.cross_score() explained in video. What it tells about?.
@dataschool
@dataschool 5 жыл бұрын
Thanks very much for your kind words! This video should help you: kzbin.info/www/bejne/bJXFo4VjjN6goKs
@earlworth
@earlworth 6 жыл бұрын
Excellent, excellent series, Kevin, stylistically and content-wise. The video you linked to is very good as well, which made me wonder in the case of disease detection how one goes about deciding what threshold to set - is this not an ethical issue to some extent or is there an industry standard? I recently saw a talk by someone using ML for the prediction of malaria, and it got me thinking. Do you know of any place where this issue of threshold setting is discussed?
@dataschool
@dataschool 6 жыл бұрын
Glad you like the series! Your question is a great one, and ethics does play a role in setting that threshold. I haven't seen this issue discussed, but let me know if you find a good article!
@neilvanasselt7107
@neilvanasselt7107 6 жыл бұрын
Hi Kevin, Have been through the video series and I think it is brilliant. The content, pace and deliver is excellent, and with so much garbage on the Internet, it is not easy to find excellent resources that you can trust. I had a quick question regarding model evaluation in a binary classification example. Say you split your data into X_train, y_train, X_test, y_test You build 3 models ( KNearestNeighbors, Logistic Regression, Random Forest) , and they all give you a pretty similar accuracy. As you run the X_test data through the model you save the results as knn_res, logreg_res, ran_forest_res. You then combine the 3 sets of results : knn_res, log_res, ran_forest_res into one dataSet, and you take the most prevalent prediction for each row. Is this over fitting or a worthwhile exercise ? Will each model make similar miss classifications ? Again, many thanks for a brilliant series. Neil.
@dataschool
@dataschool 6 жыл бұрын
You just invented ensembling! I think this lesson will answer your questions: github.com/justmarkham/DAT8/blob/master/notebooks/18_ensembling.ipynb And, thanks for your kind comments!
@neilvanasselt7107
@neilvanasselt7107 6 жыл бұрын
Thanks for the link
@flamboyantperson5936
@flamboyantperson5936 6 жыл бұрын
Thank you so much for the entire series. I loved it. What is next on your to do list? I'm waiting for new videos. Thanks a lot.
@dataschool
@dataschool 6 жыл бұрын
Ha! My to-do list is endless :) Right now I'm working on a new video-based course. Also, I have a pandas video or two coming out soon, hopefully. Thanks for your interest!
@flamboyantperson5936
@flamboyantperson5936 6 жыл бұрын
I"m waiting please upload soon. Thank you so much.
@dataschool
@dataschool 6 жыл бұрын
It will be this week :)
@flamboyantperson5936
@flamboyantperson5936 6 жыл бұрын
Wow wow I'm waiting for it :-)
@msr_manav
@msr_manav 6 жыл бұрын
All your web leatures are simply awesome. I want to explore and learn more algorithms. Do you have a plan to upload more teaching videos for other alogorithms ?
@dataschool
@dataschool 6 жыл бұрын
Thanks for your suggestion! I don't have any planned at this time.
@sowmithgantla8674
@sowmithgantla8674 7 жыл бұрын
Thanks for the tutorial.
@dataschool
@dataschool 7 жыл бұрын
You're welcome!
@LucianoPerezzini
@LucianoPerezzini 4 жыл бұрын
Hi Kevin! Thanks for the video! One question: given that the threshold is a hyperparameter, adjusting it using test observations wouldn't cause overfitting? Congrats on the channel!
@dataschool
@dataschool 4 жыл бұрын
Glad you like the video! No, it wouldn't cause overfitting because it's not a hyperparameter in the same way as other hyperparameters.
@kiranachanta9741
@kiranachanta9741 5 жыл бұрын
Hello Kevin, I have benefited a lot from your videos. Thanks for making this Videos. Small request, can you make a video on finding multicollinearity using "VIF/Any other sklearn methods" in Python.
@dataschool
@dataschool 5 жыл бұрын
Thanks for your suggestion!
@emptychannel88
@emptychannel88 7 жыл бұрын
Great series binge watched it over two days on my commute. Any good resources for image classification ?
@dataschool
@dataschool 7 жыл бұрын
Thanks! Glad it was helpful to you! Regarding image classification, I don't have any resources, but you should take a look at scikit-image.
@debanitadasgupta790
@debanitadasgupta790 4 жыл бұрын
THE BEST on youtube .. thanks a ton..
@dataschool
@dataschool 4 жыл бұрын
Thank you!
@martinrussell1404
@martinrussell1404 5 жыл бұрын
You're awesome dude! Thank you.
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@prasadkamath1205
@prasadkamath1205 4 жыл бұрын
excellent video! thanks v much
@dataschool
@dataschool 4 жыл бұрын
You're welcome!
@jiechen4015
@jiechen4015 8 жыл бұрын
Really helpful, thank you for sharing! Will you teach more about how to use matplotlib ?
@dataschool
@dataschool 8 жыл бұрын
+jie chen You're welcome! And thanks for the suggestion, I'll consider it for the future.
@tb7220
@tb7220 7 жыл бұрын
Nice deep label explanation..help's lot
@dataschool
@dataschool 7 жыл бұрын
You're welcome!
@shivamagrawal441
@shivamagrawal441 5 жыл бұрын
Sir, You are simply best
@dataschool
@dataschool 5 жыл бұрын
Thanks!
@bandhammanikanta1664
@bandhammanikanta1664 4 жыл бұрын
Hi Kevin, Thanks a lot for all your video series on Scikit learn and pandas. Could you please make a video on 'List of ML (Data science) frameworks need to learn for a person who is looking for transition to data scientist.?' You r doing great. Thank you.
@dataschool
@dataschool 4 жыл бұрын
Thanks for your suggestion, and for your kind words!
@bijayamanandhar3890
@bijayamanandhar3890 3 жыл бұрын
Thank you so much for explaining every aspect clearly. A quick question, can this classification accuracy model be used on a dataset having more than two classes such as 'iris.data' ? if not, what is the solution?
@dataschool
@dataschool 2 жыл бұрын
Yes, you can use accuracy with multi-class problems.
@akshayakn95
@akshayakn95 5 жыл бұрын
Awesome Man !!! You are the best
@dataschool
@dataschool 4 жыл бұрын
Thank you!
@Ruobingw
@Ruobingw 8 жыл бұрын
How often will this course be renewed? can't wait to get the next lecture! Is it possible to cover how ensemble different models?
@dataschool
@dataschool 8 жыл бұрын
+Ruobing Wang I am not adding any more videos to this series for the time being. However, if I restart the series, I will definitely consider your suggestion!
@David-tr4bn
@David-tr4bn 3 жыл бұрын
Hi Data School teacher, when adjusting the threshold, what do you do when the predicted probabilities do not follow a normal distribution like figure? What happens when say majority of the predicted probability values center around say 0.1 value, where you instead have more of a cluster or a bar plot like figure? What do you do in this situation? Do you also lower the sensitivity to 0.1, the maximum value?
@tatavares1985
@tatavares1985 7 жыл бұрын
Best teacher ever!
@dataschool
@dataschool 7 жыл бұрын
Thanks! :)
@akshatpunjabi9120
@akshatpunjabi9120 4 жыл бұрын
Firstly, thank you for providing such great content for Ml. Also, i didn't quite understand the definition of AUC that you gave at 50:10 . Like what exactly are you referring from predicted probability?
@dataschool
@dataschool 4 жыл бұрын
Maybe this video will help: kzbin.info/www/bejne/hXLPZ5h3rrVgr9E
@subodhmantri8365
@subodhmantri8365 6 жыл бұрын
Hi Kevin, nice video as usual, One question though, When the responses are in "Y" and "N" instead of 1 and 0 ; the recall function does not work as I believe it requires 1 and 0. Should we create dummy columns in such cases, using map function?
@dataschool
@dataschool 6 жыл бұрын
That sounds like it should work.
@hjtechguy
@hjtechguy 6 жыл бұрын
Thank you for this amazing video. Is there anyway we can integrate the y_pred_class (the new threshold) into our model's prediction accuracy score, including the roc_auc_score? Otherwise, I'm not sure if there is a point of creating the new threshold if our model is not going to use it.
@dataschool
@dataschool 6 жыл бұрын
If you want to check the accuracy of the new predictions, just use: metrics.accuracy_score(y_test, y_pred_class) Changing the classification threshold has no impact on roc_auc_score. The AUC calculation does not require a classification threshold. Hope that helps!
@hjtechguy
@hjtechguy 6 жыл бұрын
Kevin, wonderful videos, thank you greatly. I'm wondering how high must the null accuracy must be before regular classification accuracy rate becomes obsolete. In extreme situations like the fraudulent credit card case where null accuracy rate is 99%, it's obvious that AUC is much better of an indicator. What if the null accuracy rate was 50%? or 60% (when it doesn't have such a high class imbalance)? What is the null accuracy's own baseline for determining when it is useful and when it is not?
@dataschool
@dataschool 6 жыл бұрын
Great question! There's no set point, and even when there is high class imbalance, accuracy can still be at least slightly useful. It's just much less informative than other metrics at that point. Hope that helps!
@vijayprabhakaran453
@vijayprabhakaran453 5 жыл бұрын
Excellent presentation
@dataschool
@dataschool 5 жыл бұрын
Thanks!
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 137 М.
How to find the best model parameters in scikit-learn
27:46
Data School
Рет қаралды 153 М.
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 4,4 МЛН
WHO DO I LOVE MOST?
00:22
dednahype
Рет қаралды 35 МЛН
Comparing machine learning models in scikit-learn
26:42
Data School
Рет қаралды 186 М.
Making sense of the confusion matrix
35:25
Data School
Рет қаралды 118 М.
Scikit-Learn Model Pipeline Tutorial
16:50
Greg Hogg
Рет қаралды 24 М.
One Hot Encoder with Python Machine Learning (Scikit-Learn)
9:03
Ryan Nolan Data
Рет қаралды 12 М.
ROC and AUC, Clearly Explained!
16:17
StatQuest with Josh Starmer
Рет қаралды 1,4 МЛН
Khóa ly biệt
01:00
Đào Nguyễn Ánh - Hữu Hưng
Рет қаралды 4,4 МЛН