Check out our premium machine learning course with 2 Industry projects: codebasics.io/courses/machine-learning-for-data-science-beginners-to-advanced
@harshalbhoir8986 Жыл бұрын
This Was So Cool Explaination Thank You So Much!!
@mohammadriyaz55869 ай бұрын
thank you soo much sir for the easiest explanation
@apurav363Ай бұрын
Very helpful
@tomernx5 Жыл бұрын
Great video! thanks!
@AdityaKumar-w2t1z Жыл бұрын
Great Work! Keep doing.
@napoleanbonaparte92255 ай бұрын
So Sir, what we can get from this video is we can find out the precisioj % of spam & its number in the list of emails.That we can do it in two ways 1.using the traintest,multinomialnb 2. directly importing the countvectorizer using pipeline().So bag of words means simply collection of same words together,as we had collected spam here.
@AmitKumar-BIDSP Жыл бұрын
Great presentation; Thank you
@Kaafirpeado54-6ayeshaАй бұрын
Thanks 👍
@surinder36778 ай бұрын
Notes: 16:40 CountVectorizer
@ashishpanchal7012 жыл бұрын
Hello Sir, Thank You for being such a wonderful teacher!!! Just had a doubt in the Naive Bayes model that is built in this video... where have we used stemming, lematization ?? which would make it an NLP problem If we have not used them, then won't it be a simple Naive Nayes Machine Learning Problem
@codebasics2 жыл бұрын
it is not like you have to do stemming etc to consider it as an NLP problem. Here as you can see without stemming etc we got pretty good accuracy. Hence you can consider this as both NLP and machine learning problem. In fact ML is used to solve NLP problems so NLP is at a higher level. Now you can definitely try stemming etc, I would request you to build that out and see how the model performance improvers.
@nareshkumarvanga31276 ай бұрын
Thank you Guruji
@pragtisood3239 Жыл бұрын
Doubt: From where can we get the csv file
@pineappleld10 ай бұрын
In this example I can see goal was somehow a classify either ham or spam. Is it possible to build similar but classify on four options?
@vyaspadala84682 жыл бұрын
Sir please provide us big data engineer and data science course
@arvindatmuri56042 жыл бұрын
Have a look at Numpy pandas and Matplotlib and Machine learning courses, Data science is almost covered in all these topics
@swanandAragade Жыл бұрын
Even if we put input as a spam mail ,then it's not detecting that it's a spam, it only shows ham to all mail.
@ayushgupta809 ай бұрын
Bag of words --- size of vector is equal to size of vocab [ all elements are 0 , except the words present in statement ] Sparse representation - It may consume too much memory & computer resources .
@siddhanthardikar2468 Жыл бұрын
Hello sir, I had one doubt. is fit_transform and transform the same? Thats because to transform X_train you used v.fit_transform(X_train.values) and for X_test you used v.transform(X_test). I hope you can just clear this doubt for me. Thank you.
@siddharthrox Жыл бұрын
I don't know if you've figured this out by now or not. I'll share my understanding anyway. fit_transform will try to learn the vocabulary from the training data. After that it will create a matrix representation based on what it learned. But in transform, it is assumed that the learning has already happened and only a matrix representation needs to be generated. That is why you see that fit_transform is used with training data and transform is used with test data.
@datahead_girl4 ай бұрын
@@siddharthrox yes, i think that transform and not fit_transform -> v.transform(X_test) must be used with the test data. fit_transform -> the transformer learns the necessary parameters from the training data. transform -> ensures that the test data are transformed in the same way as the training set, without altering the learned parameters. Correct me if Im wrong but i think this is the whole point of using fit_transform with the training set and transform with the test set
@JakeThalacker2 жыл бұрын
Is there a simple way to edit this to use bigrams instead of single words?
@roopeshn93942 жыл бұрын
Hi, sir If you can assist me in any way, please do so with my issue. I have four or five columns of data in dict format with keys and values. I need to make a sentence or narrative from this data. Is it possible or not? If possible, Please guide me sir. input: { "source": "Sanju", "type": "message.cloud.display.AUTO_SCALING", "value": "1" }, Output: Sanju has value with this type of "message. cloud.display.AUTO_SCALING"
@hsekar6701 Жыл бұрын
I'm unable to download the en_core_web_sm pipeline..! So could anyone please help me....!
@vivekjha99522 жыл бұрын
Hi Dhaval sir, I want to learn technology for data science by you and mentored too, Could you please provide guidance for an experienced 8 year IT professional who wants to transition to Data Science as Iam not able to figure out which institute to select.
@nemsingh60352 жыл бұрын
Follow codebasics
@bhaskarbsarkar52322 жыл бұрын
Doubt : When vectorizing, we are taking X_train. According to my understanding, the vectorization is building a vocabulary w.r.t the data given. So, is it better to take the whole X instead of X_train to build the vocab and after that we can split into train and test. Because there is a possibility that some words would be in test data and not in train data. And when I took X for vectorization, the vocab size increased. So, what is the correct method here?
@codebasics2 жыл бұрын
Excellent question Bhaskar. In our case what would have happened is we had more than 4k samples in training set which probably covered majority of the vocab in test samples also. Right way would be to create a CountVectorizer and call .fit (instead of fit_transform) on entire dataset. After that on individual training set and later on test set just call .transform
@malshininissanka4106 Жыл бұрын
@@codebasics Should not we consider the test data as unseen data? If we fit countVectorizer on the entire dataset data leakage might happen?
@gopalpawar73522 жыл бұрын
Sir full stack developer course create please one videos and create playlists ...
@changeorbeextinct2 жыл бұрын
if any email has Nigeria and prince then it is authentic.. NOT :) BTW, great videos.
@debarghabhattacharjee40002 жыл бұрын
Please provide the spam.csv file....
@bhaskarbsarkar52322 жыл бұрын
It's in the git repo itself.
@kinghezzy Жыл бұрын
I cant find it there
@uptoolate18962 жыл бұрын
And that was the moment that ignoring his suggested prerequisites finally caught up with me.
@rinkisingh5529 Жыл бұрын
Why does your wife use your account to watch CID? And you have mentioned this in at least 2 of your videos, are you trying to cover something up.. Someone call the CID to investigate :)
@ramandeepbains8622 жыл бұрын
Solution of bug : AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names_out' use get_feature_names() instead of get_feature_names_out its a version issue . sample code : v.get_feature_names()[790:800]