Projects In Machine Learning | NLP for Text Classification with NLTK & Scikit-learn

Projects In Machine Learning | NLP for Text Classification with NLTK & Scikit-learn | Eduonix

Рет қаралды 42,320

Күн бұрын

Пікірлер: 59

@Eduonix 4 жыл бұрын

Black Friday Sale Has Arrived! Learn a new skill with us at incredible prices. Get your favourite E-Degrees & Bundles at flat $29. Hurry! Shop now. - bit.ly/3c2CvAQ

@datasciencetoday7127 3 жыл бұрын

Been studying data science for 2 years in my Masters degree, but I think this video is the point from where my journey has actually started. Thank you Eduonix.

@datasciencetoday7127 3 жыл бұрын

@29:45 what other project are you talking about sir?

@vedashreeg530 5 жыл бұрын

Thank you so much for this video...... If an error saying "zip has no len" is displayed then use "list(zip(" to rectify the error.

@eduonixsupport1889 5 жыл бұрын

Thank you for sharing your valuable insights. We are glad you liked it. Subscribe for more such videos...

@arijitsantra 5 жыл бұрын

yes i experienced the same

@feressakouhi6224 4 жыл бұрын

use try except pass ethode in the same lin it works

@tessdejaeghere6972 4 жыл бұрын

Thank you!

@abhishekbourai1832 3 жыл бұрын

Thank You!

@Eduonix 4 жыл бұрын

become a Master of trending skills with our brand New Pocket Friendly Bundles & E - degrees - bit.ly/3c2pKG5

@PRAJAKTAPATKAR-t7z 9 ай бұрын

Great video - many thanks for sharing :)

@Eduonix 9 ай бұрын

Hi PRAJAKTA PATKAR, thanks for letting us know! We're so glad you find the videos helpful. That's exactly what we aim for! If you have any specific video requests, feel free to share them

@udayjagtap6643 4 жыл бұрын

the model accuracy is good but after training, I take a number of examples from the dataset which are spam and try to predict using model.predic() the actual model predicts that messages are not spam..What kind of issue this is?

@SaadAzizmian 4 жыл бұрын

How do you get the dataset of first and last 30 rows? I only get first 5 and last 5

@srinivas1404 4 жыл бұрын

Which vectorization technique used in this project?

@tessdejaeghere6972 4 жыл бұрын

Great tutorial, nice pace! Thank you!

@abhijitmalode3189 4 жыл бұрын

Hello good video can i get the bhole dataset where the csv file for project

@NolanMurphyWhitehead 4 жыл бұрын

I got the whole thing to work, for the most part, but am unclear on one thing. How do we then test what we've done on new texts? Like, how can I type 'how u doin' or 'Congratulation! You've been selected to blah blah blah' to see if the model we created would predict these to be Ham or Spam? Without the ability to do that, we have essentially created nothing.

@88Timur88Bahmudov88 4 жыл бұрын

The same as we tested our testing set. Or you can just add your messages to the testing data before testing this model

@tallurinagapoornima2356 5 жыл бұрын

Suppose if I have a directory of files and images and all my data on my own pc, by passing a text message of data ' show me salary report' it should open that report directly. We have to train the model such that it understands the text message I pass. Please say me how can I do that using NLP

@ashikmahmud1404 3 жыл бұрын

Thank you very much. Great tutorial!

@ravikiranreddybodireddy6140 5 жыл бұрын

Hello Admin , The code featuresets = [(find_features(text), label) for (text, label) in messages] doesn't separate the features which are assigned as False. I believe only features with TRUE should be considered, since they are in the message. Also How the input vector should be is it go , 1 got , 0 numbr, 1 (or) ((go,true),(got,true),(numbr,true), 1) here 0 represents spam and 1 represent ham. can you help me on this which the right input

@souldynamo1722 5 жыл бұрын

Can we work on spyder with this code? I tryed but i get an error "SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape" and even when i'm working with jupyter as well i was getting an error "NameError Traceback (most recent call last) in 5 stop_words = set(stopwords.words('english')) 6 ----> 7 processed = processed.apply(lambda x: ' '.join(term for term in x.split() if term not in stop_words)) NameError: name 'processed' is not defined [ ]: " i.e.,Getting an error at 30:51

@leonhardeuler9839 6 жыл бұрын

I got accuracies higher than 95 except for KNN which is 94, do I have to use ensemble method?

@eduonixsupport1889 6 жыл бұрын

Yes, you have to use ensemble method

@leonhardeuler9839 6 жыл бұрын

This tutorial was really fun and informative. Thanks so much :)

@Eduonix 6 жыл бұрын

You're Welcome !! Subscribe for more!

@saakethch7416 6 жыл бұрын

love the video, watched the whole thing .. respect bruh ..

@Eduonix 6 жыл бұрын

We're glad you liked the tutorial Saaketh! Subscribe for more ML lessons!

@ΝίκοςΔημητρακόπουλος-σ6δ 5 жыл бұрын

clean tutorial

@VijayBhaskarSingh 5 жыл бұрын

terrific explanation mate! Thanks for the Video..

@000jhs 5 жыл бұрын

Big text please!

@anishjain8096 5 жыл бұрын

Sir please zoom in i can't see

@amadousadiodiallo9183 4 жыл бұрын

thank you very much sir

@nidhilohani4937 5 жыл бұрын

So helpful thanks for this video

@prasadjoshi8213 4 жыл бұрын

hi sir really loved the video!! Really Fruitful. I am working on Dataset:-Fetch20ewsgroup, using your code, but when it comes at the code where we are splitting the featureset into training and testing : training, testing = train_test_split(featuresets, test_size = 0.25, random_state=seed). It gives Error: ValueError: With n_samples=1, test_size=0.25 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters. Plz help me....

@ariii4118 3 жыл бұрын

Tokenization and Bag of Word vectorization are not the same thing though

@gagangupta7840 6 жыл бұрын

This Is nice and very informative tutorial ..

@Eduonix 6 жыл бұрын

Thank you !! Subscribe for more.

@prempotabatti7615 6 жыл бұрын

Make more such videos where you actually code a project. Most people make videos where they only explain concepts. TIA.

@Eduonix 5 жыл бұрын

Thank you for appreciating. Do Subscribe for more such videos...

@rajuraman5773 5 жыл бұрын

A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. I'm getting this error in # Ensemble methods - Voting classifier what I have to do to remove this error.

@999wwright 5 жыл бұрын

Good Video! Clear, concise, informative. Only thing, Ensemble is pronounced 'Onsomble'

@Eduonix 5 жыл бұрын

Thank you !! Subscribe for more.

@全希 5 жыл бұрын

老师，这个课程很好，我想知道这个视频的老师还有没有类似的视频了？我愿意付费观看

@__manish_kumar__ 4 жыл бұрын

I used 1000 most common words and accuracy is still over 98% I tried looking what may be causing the problem in your case, most likely reason i found is that.. when using regular expressions to change email , or number etc you need to replace them with " emailaddr " (spaces in start and end so they can considered as seprate) rather than "emailaddr" because there are lot of cases where we have words like "$10" its going to match both money and number, in the end you will get a word "moneynumber" that is neither money or number and messing with most common words significantly especially in case of spams...

@CharanTeja-cm8ug 6 жыл бұрын

Video quality is very poor,

@Eduonix 6 жыл бұрын

Hi Charan, try watching it in 1080p, that should help.

@smruthirp2155 3 жыл бұрын

Hey, I did a small change while selecting the features. You had just randomly selected 1500 features I took the first most common 1500 features. You had given in the comment to select the most common features, but it was not like that in your code. Anyway, by doing so I got 99% accuracy through naive_bayes. Hopes this helps someone :))

@eshakolte4043 3 жыл бұрын

How did you select the most common features?

@suniltoskar Жыл бұрын

That helped!