Professional Preprocessing with Pipelines in Python

  Рет қаралды 56,089

NeuralNine

NeuralNine

2 жыл бұрын

In this video, we learn about preprocessing pipelines and how to professionally prepare data for machine learning.
◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾
📚 Programming Books & Merch 📚
🐍 The Python Bible Book: www.neuralnine.com/books/
💻 The Algorithm Bible Book: www.neuralnine.com/books/
👕 Programming Merch: www.neuralnine.com/shop
🌐 Social Media & Contact 🌐
📱 Website: www.neuralnine.com/
📷 Instagram: / neuralnine
🐦 Twitter: / neuralnine
🤵 LinkedIn: / neuralnine
📁 GitHub: github.com/NeuralNine
🎙 Discord: / discord
🎵 Outro Music From: www.bensound.com/

Пікірлер: 38
@vzinko
@vzinko 5 ай бұрын
Rather than creating a class for each step, another much easier approach is to make use of sklearn's FunctionTransformer. This basically allows you to write a custom function and turn it into a transformer object, which can then be fed through a pipeline as per normal
@Pwnstara
@Pwnstara 2 жыл бұрын
For those who noticed that the encoder seems to sort the values alphabetically and messes up the job column names, instead of manually typing column names you can do: matrix = encoder.fit_transform(X[['Job']]).toarray() column_names = sorted([i for i in df['Job'].unique()]) This will also work if there are more /new jobs and values added and makes a column for each unique value while keeping the order. Good tutorial in any case!
@jacksummers3918
@jacksummers3918 Жыл бұрын
Use pd.get_dummies(X.Job, prefix="Job") Much neater
@nathanhaynes2856
@nathanhaynes2856 3 ай бұрын
Nice. For this example I might use the ColumnTransformer class, its perfect for dropping columns and integrating imputers and scalers on select features.
@dmitriidavs4181
@dmitriidavs4181 2 жыл бұрын
Fantastic video, always wondered the reasoning behind using classes in ml, thank you!!!
@onecarry1532
@onecarry1532 2 жыл бұрын
Hey man, great channel! Love the topic based tutorials ❤️ Video Suggestion: Can I suggest you attempt making a video on: Using Python and the Tree Algorithm to make an autocomplete Python CLI program. Haven’t seen this anywhere and I guess it’s a great way to understand why the Tree algorithm might be the best solution for an autocomplete program. Thanks! Sure we all appreciate what you do for the community ♥️ 🌻
@isaacandrewdixon
@isaacandrewdixon Жыл бұрын
This was awesome and very informative. Many thanks from a machine learning novice!
@Deacc
@Deacc 2 жыл бұрын
This video is pure gold. Thank you so much!
@niv_syt6315
@niv_syt6315 2 жыл бұрын
I remember when I took courses from udemy in ML and took more time from this video, keeps to continue creating more videos from the same subject.
@vlplbl85
@vlplbl85 2 жыл бұрын
I find using FunctionTransformer much easier. It turns each of your custom functions into a transformer and you don't need to write a class, but just a function.
@manyes7577
@manyes7577 Жыл бұрын
wow this technique is amazing. thanks for sharing us with brilliant knowledge
@apheironnn
@apheironnn 10 ай бұрын
That was really helpful, thanks!
@tharakawickramasinghe3762
@tharakawickramasinghe3762 Жыл бұрын
Thank you. This is very helpful.
@MrTactics26
@MrTactics26 3 ай бұрын
Sick video bro! 😎
@Juzz_RSA
@Juzz_RSA Жыл бұрын
Thank you, this was informative 😁
@juandiegoorozco5531
@juandiegoorozco5531 7 ай бұрын
really useful, thank you very much
@sviteribuben7245
@sviteribuben7245 2 жыл бұрын
Very usefull! Thx!
@pakaponwiwat2405
@pakaponwiwat2405 5 ай бұрын
Thank you, sir!
@jelcroospockt
@jelcroospockt 6 ай бұрын
I would really like to find a tutorial on how to pass arguments to an pipeline function you created yourself, like the namedropper. So i can use the gridsearch to try out dropping different features.
@juanbetancourt5106
@juanbetancourt5106 2 жыл бұрын
Great!
@thomasgoodwin2648
@thomasgoodwin2648 2 жыл бұрын
With an eye towards the love that programming has gotten from the ml community lately, it occurs to me that perhaps ml could also be used more in the data preprocessing role. For example: Choosing encoding types, handling missing values, flattening, etc could all be automated. Just a thought. 2nd random thought. I know random noise has been added to features in an attempt to get the models to generalize better but did not fare well. However I have not seen that anyone has tried simply using noise generators (normal, gaussian, etc) as individual features and allowing the model itself to choose when and where noise might be effective.
@allanmachado2011
@allanmachado2011 Ай бұрын
Thank you!
@nikulnayi3271
@nikulnayi3271 Жыл бұрын
Thank you so much nicely explained with what you showed i created pipeline and dumped it as pikle file but when i tryinng to load that model and using it. i have been facing an error : AttributeError: Can't get attribute 'NullEncoder' on
@736939
@736939 Жыл бұрын
16:42 I think it's wrong to use fit_transform in transform method, because it will cause to memory leakage, after you divide data into two parts train/test - where transform on the test dataset will recalculate imputer.
@falkstankat6511
@falkstankat6511 Жыл бұрын
Yeah, thought the Same
@__wouks__
@__wouks__ 2 жыл бұрын
I think your feature encoder has some faulty logic for the "Job" column. The df2 for example shows 1 x writer, 3 x programmer and 1 x teacher, but afterwards there isn't even a "teacher" column. And if you were to recreate the single columns using 1 or 0 from the features you created you wouldn't get the same dataframe.
@aayushpatel2904
@aayushpatel2904 Жыл бұрын
Thanks Sir
@o1techacademy
@o1techacademy 8 ай бұрын
Awesome
@nachoeigu
@nachoeigu Жыл бұрын
I have a big one question: What is the difference of build a Machine Learning application with Pipeline and to build a machine learning application with a OOP technique? I see that it is the same.
@adriandiaz5688
@adriandiaz5688 Жыл бұрын
Yeah, this is a great video but that's something I'm curious about as well.
@MalcombBrown
@MalcombBrown 2 жыл бұрын
Could you use the get_dummies pandas method for the One Hot Encoding?
@lexcheshir6416
@lexcheshir6416 2 жыл бұрын
yep
@rohscx
@rohscx 2 жыл бұрын
What is the opening song of this videos name?
@slothner943
@slothner943 7 ай бұрын
Are you swedish? 😮
@dilshodfayzullayev924
@dilshodfayzullayev924 4 ай бұрын
where do you work #admin
@bellabella-tv8zg
@bellabella-tv8zg 2 жыл бұрын
1st
КАКОЙ ВАШ ЛЮБИМЫЙ ЦВЕТ?😍 #game #shorts
00:17
Poopigirl
Рет қаралды 4,6 МЛН
Bro be careful where you drop the ball  #learnfromkhaby  #comedy
00:19
Khaby. Lame
Рет қаралды 30 МЛН
it takes two to tango 💃🏻🕺🏻
00:18
Zach King
Рет қаралды 24 МЛН
How do I encode categorical features using scikit-learn?
27:59
Data School
Рет қаралды 136 М.
Scikit-Learn Model Pipeline Tutorial
16:50
Greg Hogg
Рет қаралды 23 М.
The BEST library for building Data Pipelines...
11:32
Rob Mulla
Рет қаралды 68 М.
Data Pipelines Explained
8:29
IBM Technology
Рет қаралды 133 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 256 М.
Modern Graphical User Interfaces in Python
11:12
NeuralNine
Рет қаралды 1,4 МЛН
⌨️ Сколько всего у меня клавиатур? #обзор
0:41
Гранатка — про VR и девайсы
Рет қаралды 592 М.
Samsung Android Mobile Battrey
0:39
Gaming zone
Рет қаралды 342 М.
Wow AirPods
0:17
ARGEN
Рет қаралды 1,1 МЛН
Airpods’un Gizli Özelliği mi var?
0:14
Safak Novruz
Рет қаралды 7 МЛН
ПРОБЛЕМА МЕХАНИЧЕСКИХ КЛАВИАТУР!🤬
0:59
Корнеич
Рет қаралды 1,9 МЛН