fit_tranform on x_train and tranform on x_test. Reason - by fit_transform we are learning the parameters and transforming the x_train and if we do again fit_transform on x_test it will learn the parameters again so will do only transform on x_test. and sara mazra overfitting ka hai . Hope this is making sense.
@UnfoldDataScience Жыл бұрын
Yes - you understand the concepts well. Only thing to keep in mind, where we can use "learned parameters" on new data and where we can not
@mushinart Жыл бұрын
After 2 long years ....now i know the answer 😭....im grateful
@PreenitaBhattacharya2 ай бұрын
you are doing social work with such explanation sir. Thank you very much.
@shubhamagrawal70682 жыл бұрын
We can apply fit on training data so that we have parameter values with us. We can also use fit_transform on training data. It will calculate parameter values from training data and do transformation as well. But on testing data, we always use transform and use the parameter values from training data. This will lead to data leakage problem. To avoid leakage problem we might use fit_transform on testing data. Correct me if I am wrong. And plz avoid this confusion by making a video Aman bhaiya...!!!!
@Krishna-pn5je Жыл бұрын
Hi Aman , thanks for the video. my answer is below. In the prediction stage we don't require scalar object because the model still understands the numeric data and we require scaling only if the dataset has multiple numeric features and if we want to compute distance between data points In the prediction stage of tfidif vector, we should pass the vectorizer object because the vectorizer object helps in transforming the text to vector at evaluation stage before passing it to the model for prediction which is necessary.
@dakshbhatnagar2 жыл бұрын
For prediction we should ideally use transform because the data is fitted on training data and the test data is transformed using that fitted object. This can be for both tfidf and the scaler object. I could be wrong but this makes sense for me.
@UnfoldDataScience2 жыл бұрын
Hi Daksh, hope u are doing great. Seeing your comment after long. For scaling, do you see some data leakage problems?
@kausikkar2587 Жыл бұрын
Well that's what we follow usually, but then, there are cases where you have a completely different type of data with different number of maximum features. In that case you have to again fit your test data too. I applied it today on my Vikram IMDb film review NLP project using CountVectorizer and MultinomialNB. And it worked as expected. Hope this helps.
@ibrahimmosty18607 ай бұрын
I will use separated scaler because each scaler save the data for the specific column
@himalayaashish9472 жыл бұрын
Hi, For the prediction.. we will have to use only transform because we have trained the model and we want to use same parameters so we will only use transform. For tfidf we will use fit_transform. Since the corpus is changing so we need to calculate the parameters and then apply so we will have to use fit_transform.
@squadgang16782 жыл бұрын
I go with his answer
@UnfoldDataScience2 жыл бұрын
Thanks for the answer, do you see data leakage problems with your approach?
@UnfoldDataScience2 жыл бұрын
Also for tf idf, if your new corpus has a new word that was never there in training then what happens to model?
@learning_with_irving4266 Жыл бұрын
So is standardizing just finding the z score?
@chandrabhanbahetwar9638 Жыл бұрын
Bhai btana ha to puri chije clear btaya kro yr ye kya bhai tumne to hme hi confuse kr diya ki fit_transform use krege ya nhi test dataset me. video me reach chahiye to bol diya kro bhai hm sb comment kr dege lekin aisa confusion me fsake mt jaya kro. btana h to pura clear btao vrna rhne do
@shrirajpathak2 жыл бұрын
Why create all this confusion, just make the video with the answers in it...
@iyyappanmuthusamy16782 жыл бұрын
I don't think we will use both fit and transform function because while testing the dataset for our ml model we will not use testing dataset. we will use xtrain and ytrain dataset alone to feed for train our model in scaling.
@UnfoldDataScience2 жыл бұрын
Which use case? is it Sclaing or tfidf you are suggesting about?
@rosemarydara1025 Жыл бұрын
This guy's teaching is really really amazing
@subhashdixit51672 жыл бұрын
Thanks for taking my comments seriously
@niranjan.tanpure2 жыл бұрын
Product manager vs Data scientists which 1 pays you well sir ?
@UnfoldDataScience2 жыл бұрын
Managing data science product is not at all an easy task - it will need all qualities of a seasoned data scientist + more. I believe should be paid more than a normal data scientist.
@weirdyounes7618 Жыл бұрын
Thkuuuuuu 🎉
@arpittrivedi66362 жыл бұрын
In prediction we use only fit
@UnfoldDataScience2 жыл бұрын
Only "fit" or only "transform"? Also in which scenario scaling/tf-idf