Rather than creating a class for each step, another much easier approach is to make use of sklearn's FunctionTransformer. This basically allows you to write a custom function and turn it into a transformer object, which can then be fed through a pipeline as per normal
@HamzaShahid-s8t3 ай бұрын
Yeah Sklearn Transrfoermersa are good but creating a Class gives you the upper Hand of fitting on the data like learn from the data Advantages of Custom Transformer Classes Stateful Transformations: Custom transformers can maintain state (e.g., learned parameters) through the fit and transform methods. This is particularly useful for transformations that require learning from the data, such as scaling or encoding based on the training data. Integration with Pipelines: Custom transformers can seamlessly integrate into scikit-learn pipelines, allowing you to leverage all the benefits of pipelines, including cross-validation and hyperparameter tuning. More Control: Creating a class allows for more complex logic and functionality, such as handling edge cases, logging, and error handling. Reusability: Once defined, a custom transformer can be reused across different projects or datasets without modification.
@twentytwentyeight3 ай бұрын
@@HamzaShahid-s8t great breakdown of when and why to use custom transformers
@randomfinn4042 жыл бұрын
For those who noticed that the encoder seems to sort the values alphabetically and messes up the job column names, instead of manually typing column names you can do: matrix = encoder.fit_transform(X[['Job']]).toarray() column_names = sorted([i for i in df['Job'].unique()]) This will also work if there are more /new jobs and values added and makes a column for each unique value while keeping the order. Good tutorial in any case!
@jacksummers39182 жыл бұрын
Use pd.get_dummies(X.Job, prefix="Job") Much neater
@ShortsSmith6 ай бұрын
G.... Thank You... I was hoping to that some one noticed it... I'm glad that I got the Better version ❤
@na_haynes11 ай бұрын
Nice. For this example I might use the ColumnTransformer class, its perfect for dropping columns and integrating imputers and scalers on select features.
@isaacandrewdixon2 жыл бұрын
This was awesome and very informative. Many thanks from a machine learning novice!
@onecarry15322 жыл бұрын
Hey man, great channel! Love the topic based tutorials ❤️ Video Suggestion: Can I suggest you attempt making a video on: Using Python and the Tree Algorithm to make an autocomplete Python CLI program. Haven’t seen this anywhere and I guess it’s a great way to understand why the Tree algorithm might be the best solution for an autocomplete program. Thanks! Sure we all appreciate what you do for the community ♥️ 🌻
@niv_syt63152 жыл бұрын
I remember when I took courses from udemy in ML and took more time from this video, keeps to continue creating more videos from the same subject.
@736939 Жыл бұрын
16:42 I think it's wrong to use fit_transform in transform method, because it will cause to memory leakage, after you divide data into two parts train/test - where transform on the test dataset will recalculate imputer.
@falkstankat6511 Жыл бұрын
Yeah, thought the Same
@Deacc2 жыл бұрын
This video is pure gold. Thank you so much!
@vlplbl852 жыл бұрын
I find using FunctionTransformer much easier. It turns each of your custom functions into a transformer and you don't need to write a class, but just a function.
@dmitriidavs41812 жыл бұрын
Fantastic video, always wondered the reasoning behind using classes in ml, thank you!!!
@MrTactics2611 ай бұрын
Sick video bro! 😎
@josipgregoric538016 күн бұрын
How would you preprocess a single (test) example using the Data Pipeline if the Pipeline has estimators that apply Power Transformations, Outlier Removal, Standard Scaling, etc.? I wouldn't want to manually introduce another StandardScaler() only to standardize my new example on which I want a prediction to be made; this would require me to fit this Scaler on the whole dataset beforehand, and then transform my single example, at which point the point of using a Pipeline in the first place is lost...
@nachoeigu2 жыл бұрын
I have a big one question: What is the difference of build a Machine Learning application with Pipeline and to build a machine learning application with a OOP technique? I see that it is the same.
@adriandiazNY Жыл бұрын
Yeah, this is a great video but that's something I'm curious about as well.
@manyes75772 жыл бұрын
wow this technique is amazing. thanks for sharing us with brilliant knowledge
@jelcroospockt Жыл бұрын
I would really like to find a tutorial on how to pass arguments to an pipeline function you created yourself, like the namedropper. So i can use the gridsearch to try out dropping different features.
@Juzz_RSA Жыл бұрын
Thank you, this was informative 😁
@apheironnn Жыл бұрын
That was really helpful, thanks!
@tk_wickramasinghe2 жыл бұрын
Thank you. This is very helpful.
@MalcombBrown2 жыл бұрын
Could you use the get_dummies pandas method for the One Hot Encoding?
@lexcheshir64162 жыл бұрын
yep
@gasfeesofficial35577 ай бұрын
bro great video!!
@juandiegoorozco5531 Жыл бұрын
really useful, thank you very much
@thomasgoodwin26482 жыл бұрын
With an eye towards the love that programming has gotten from the ml community lately, it occurs to me that perhaps ml could also be used more in the data preprocessing role. For example: Choosing encoding types, handling missing values, flattening, etc could all be automated. Just a thought. 2nd random thought. I know random noise has been added to features in an attempt to get the models to generalize better but did not fare well. However I have not seen that anyone has tried simply using noise generators (normal, gaussian, etc) as individual features and allowing the model itself to choose when and where noise might be effective.
@__wouks__2 жыл бұрын
I think your feature encoder has some faulty logic for the "Job" column. The df2 for example shows 1 x writer, 3 x programmer and 1 x teacher, but afterwards there isn't even a "teacher" column. And if you were to recreate the single columns using 1 or 0 from the features you created you wouldn't get the same dataframe.
@pakaponwiwat2405 Жыл бұрын
Thank you, sir!
@nikulnayi3271 Жыл бұрын
Thank you so much nicely explained with what you showed i created pipeline and dumped it as pikle file but when i tryinng to load that model and using it. i have been facing an error : AttributeError: Can't get attribute 'NullEncoder' on