fit vs transform vs fit_transform | fit vs fit_transform

fit vs transform vs fit_transform | fit vs fit_transform | fit and fit_transofrm in sklearn

Рет қаралды 13,538

Күн бұрын

Пікірлер: 27

@HimanshuKumar-oi8qh Жыл бұрын

fit_tranform on x_train and tranform on x_test. Reason - by fit_transform we are learning the parameters and transforming the x_train and if we do again fit_transform on x_test it will learn the parameters again so will do only transform on x_test. and sara mazra overfitting ka hai . Hope this is making sense.

@UnfoldDataScience Жыл бұрын

Yes - you understand the concepts well. Only thing to keep in mind, where we can use "learned parameters" on new data and where we can not

@mushinart Жыл бұрын

After 2 long years ....now i know the answer 😭....im grateful

@PreenitaBhattacharya 2 ай бұрын

you are doing social work with such explanation sir. Thank you very much.

@shubhamagrawal7068 2 жыл бұрын

We can apply fit on training data so that we have parameter values with us. We can also use fit_transform on training data. It will calculate parameter values from training data and do transformation as well. But on testing data, we always use transform and use the parameter values from training data. This will lead to data leakage problem. To avoid leakage problem we might use fit_transform on testing data. Correct me if I am wrong. And plz avoid this confusion by making a video Aman bhaiya...!!!!

@Krishna-pn5je Жыл бұрын

Hi Aman , thanks for the video. my answer is below. In the prediction stage we don't require scalar object because the model still understands the numeric data and we require scaling only if the dataset has multiple numeric features and if we want to compute distance between data points In the prediction stage of tfidif vector, we should pass the vectorizer object because the vectorizer object helps in transforming the text to vector at evaluation stage before passing it to the model for prediction which is necessary.

@dakshbhatnagar 2 жыл бұрын

For prediction we should ideally use transform because the data is fitted on training data and the test data is transformed using that fitted object. This can be for both tfidf and the scaler object. I could be wrong but this makes sense for me.

@UnfoldDataScience 2 жыл бұрын

Hi Daksh, hope u are doing great. Seeing your comment after long. For scaling, do you see some data leakage problems?

@kausikkar2587 Жыл бұрын

Well that's what we follow usually, but then, there are cases where you have a completely different type of data with different number of maximum features. In that case you have to again fit your test data too. I applied it today on my Vikram IMDb film review NLP project using CountVectorizer and MultinomialNB. And it worked as expected. Hope this helps.

@ibrahimmosty1860 7 ай бұрын

I will use separated scaler because each scaler save the data for the specific column

@himalayaashish947 2 жыл бұрын

Hi, For the prediction.. we will have to use only transform because we have trained the model and we want to use same parameters so we will only use transform. For tfidf we will use fit_transform. Since the corpus is changing so we need to calculate the parameters and then apply so we will have to use fit_transform.

@squadgang1678 2 жыл бұрын

I go with his answer

@UnfoldDataScience 2 жыл бұрын

Thanks for the answer, do you see data leakage problems with your approach?

@UnfoldDataScience 2 жыл бұрын

Also for tf idf, if your new corpus has a new word that was never there in training then what happens to model?

@learning_with_irving4266 Жыл бұрын

So is standardizing just finding the z score?

@chandrabhanbahetwar9638 Жыл бұрын

Bhai btana ha to puri chije clear btaya kro yr ye kya bhai tumne to hme hi confuse kr diya ki fit_transform use krege ya nhi test dataset me. video me reach chahiye to bol diya kro bhai hm sb comment kr dege lekin aisa confusion me fsake mt jaya kro. btana h to pura clear btao vrna rhne do

@shrirajpathak 2 жыл бұрын

Why create all this confusion, just make the video with the answers in it...

@iyyappanmuthusamy1678 2 жыл бұрын

I don't think we will use both fit and transform function because while testing the dataset for our ml model we will not use testing dataset. we will use xtrain and ytrain dataset alone to feed for train our model in scaling.

@UnfoldDataScience 2 жыл бұрын

Which use case? is it Sclaing or tfidf you are suggesting about?

@rosemarydara1025 Жыл бұрын

This guy's teaching is really really amazing

@subhashdixit5167 2 жыл бұрын

Thanks for taking my comments seriously

@niranjan.tanpure 2 жыл бұрын

Product manager vs Data scientists which 1 pays you well sir ?

@UnfoldDataScience 2 жыл бұрын

Managing data science product is not at all an easy task - it will need all qualities of a seasoned data scientist + more. I believe should be paid more than a normal data scientist.