Рет қаралды 136,789
In order to include categorical features in your Machine Learning model, you have to encode them numerically using "dummy" or "one-hot" encoding. But how do you do this correctly using scikit-learn?
In this video, you'll learn how to use OneHotEncoder and ColumnTransformer to encode your categorical features and prepare your feature matrix in a single step. You'll also learn how to include this step within a Pipeline so that you can cross-validate your model and preprocessing steps simultaneously. Finally, you'll learn why you should use scikit-learn (rather than pandas) for preprocessing your dataset.
AGENDA:
0:00 Introduction
0:22 Why should you use a Pipeline?
2:30 Preview of the lesson
3:35 Loading and preparing a dataset
6:11 Cross-validating a simple model
10:00 Encoding categorical features with OneHotEncoder
15:01 Selecting columns for preprocessing with ColumnTransformer
19:00 Creating a two-step Pipeline
19:54 Cross-validating a Pipeline
21:44 Making predictions on new data
23:43 Recap of the lesson
24:50 Why should you use scikit-learn (rather than pandas) for preprocessing?
CODE FROM THIS VIDEO: github.com/justmarkham/scikit...
WANT TO JOIN MY NEXT LIVE WEBCAST? Become a member ($5/month):
/ dataschool
=== RELATED RESOURCES ===
OneHotEncoder documentation: scikit-learn.org/stable/modul...
ColumnTransformer documentation: scikit-learn.org/stable/modul...
Pipeline documentation: scikit-learn.org/stable/modul...
My video on cross-validation: • Selecting the best mod...
My video on grid search: • How to find the best m...
My lesson notebook on StandardScaler: nbviewer.jupyter.org/github/j...
=== WANT TO GET BETTER AT MACHINE LEARNING? ===
1) WATCH my scikit-learn video series: • Machine learning in Py...
2) SUBSCRIBE for more videos: kzbin.info?su...
3) ENROLL in my Machine Learning course: www.dataschool.io/learn/
4) LET'S CONNECT!
- Newsletter: www.dataschool.io/subscribe/
- Twitter: / justmarkham
- Facebook: / datascienceschool
- LinkedIn: / justmarkham