Weka Text Classification for First Time & Beginner Users

  Рет қаралды 171,069

Brandon Weinberg

Brandon Weinberg

Күн бұрын

Пікірлер
@omfglykwtfff
@omfglykwtfff 9 жыл бұрын
If I could give this video 10000 thumbs up I totally would. Brilliant work and great explanation of how all the different features actually work. You may have just saved me from failing a class. Thankyouthankyouthankyou.
@BrandonWeinberg
@BrandonWeinberg 11 жыл бұрын
Put this vid on KZbin today for ppl using machine learning and text classification for the first time in WEKA. My other vid was a general intro (23 min) and this is for text classification (59 min) 0:00 Introduction (5 minutes) 5:06 TextToDirectoryLoader (3 minutes) 8:12 StringToWordVector (19 minutes) 27:37 AttributeSelect (10 minutes) 37:37 Cost Sensitivity and Class Imbalance (8 minutes) 45:45 Classifiers (14 minutes) 59:07 Conclusion (20 seconds) Want to skip to a specific part? - Section 1 - 5:49 TextDirectoryLoader Command (1 minute) - Section 2 - 6:44 ARFF File Syntax (1 minute 30 seconds) 8:10 Vectorizing Documents (2 minutes) 10:15 WordsToKeep setting/Word Presence (1 minute 10 seconds) 11:26 OutputWordCount setting/Word Frequency (25 seconds) 11:51 DoNotOperateOnAPerClassBasis setting (40 seconds) 12:34 IDFTransform and TFTransform settings/TF-IDF score (1 minute 30 seconds) 14:09 NormalizeDocLength setting (1 minute 17 seconds) 15:46 Stemmer setting/Lemmatization (1 minute 10 seconds) 16:56 Stopwords setting/Custom Stopwords File (1 minute 54 seconds) 18:50 Tokenizer setting/NGram Tokenizer/Bigrams/Trigrams/Alphabetical Tokenizer (2 minutes 35 seconds) 21:25 MinTermFreq setting (20 seconds) 21:45 PeriodicPruning setting (40 seconds) 22:25 AttributeNamePrefix setting (16 seconds) 22:42 LowerCaseTokens setting (1 minute 2 seconds) 23:45 AttributeIndices setting (2 minutes 4 seconds) - Section 3 - 28:07 AttributeSelect for reducing dataset to improve classifier performance/InfoGainEval evaluator/Ranker search (7 minutes) - Section 4 - 38:32 CostSensitiveClassifer/Adding cost effectiveness to base classifier (2 minutes 20 seconds) 42:17 Resample filter/Example of undersampling majority class (1 minute 10 seconds) 43:27 SMOTE filter/Example of oversampling the minority class (1 minute) - Section 5 - 45:34 Training vs. Testing Datasets (1 minute 32 seconds) 47:07 Naive Bayes Classifier (1 minute 57 seconds) 49:04 Multinomial Naive Bayes Classifier (10 seconds) 49:33 K Nearest Neighbor Classifier (1 minute 34 seconds) 51:17 J48 (Decision Tree) Classifier (2 minutes 32 seconds) 53:50 Random Forest Classifier (1 minute 39 seconds) 55:55 SMO (Support Vector Machine) Classifier (1 minute 38 seconds) 57:35 Supervised vs Semi-Supervised vs Unsupervised Learning/Clustering (1 minute 20 seconds) Since all text data is turned into numbers and categories after sections 1-2, most of sections 3-5 are useful in both text classification and other data analysis in WEKA. Classifiers introduces you to six (but not all) of WEKA's popular classifiers for text mining; 1) Naive Bayes, 2) Multinomial Naive Bayes, 3) K Nearest Neighbor, 4) J48, 5) Random Forest and 6) SMO. Each StringToWordVector setting is shown, e.g. tokenizer, outputWordCounts, normalizeDocLength, TF-IDF, stopwords, stemmer, etc. These are ways of representing documents as document vectors. Automatically converting 2,000 text files (plain text documents) into an ARFF file with TextDirectoryLoader is shown. Additionally shown is AttributeSelect which is a way of improving classifier performance by reducing the dataset. Cost-Sensitive Classifier is shown which is a way of assigning weights to different types of guesses. Resample and SMOTE are shown as ways of undersampling the majority class and oversampling the majority class. Introductory tips are shared throughout, e.g. distinguishing supervised learning (which is most of data mining) from semi-supervised and unsupervised learning, making identically-formatted training and testing datasets, how to easily subset outliers with the Visualize tab and more...
@ranaalqaisi9331
@ranaalqaisi9331 10 жыл бұрын
would you help me , i am running the same expermint , i followed the steps in the video , but most of the calssifiers are inactive (grey color) , thanx in advance
@cornchips007
@cornchips007 10 жыл бұрын
Rana Alqaisi your data set is not ready for use. Prepare it properly
@ranaalqaisi9331
@ranaalqaisi9331 10 жыл бұрын
Thank you , it solved :)
@meghathakor9268
@meghathakor9268 8 жыл бұрын
+Rana Alqaisi i am running the same experiment but unable to load text directory to .arff format. what should i do? please reply me as soon as possible.
@ranaalqaisi9331
@ranaalqaisi9331 8 жыл бұрын
You need to have a csv or arff fie to run the file in weka, if its text file you have to see the structure of arff file and convert it ! for any help let me know !
@khyativyas8956
@khyativyas8956 7 жыл бұрын
This is the one of the besttt videos I have seen! Thank you so much!! It's crazy how powerful WEKA is!
@ali_z1980
@ali_z1980 8 жыл бұрын
I really appreciate you uploading this video. You clarified several questions I had that my professor struggled to explain to his class.
@xavieryang1255
@xavieryang1255 2 жыл бұрын
so much better and useful than Weka's official tutorial
@payalbaruah
@payalbaruah 11 жыл бұрын
Thank you soo much! This is the MOST Informative and helpful video in WEKA!! Keep up the good work Brandon! We are learning so much from your videos.
@ranaalqaisi9331
@ranaalqaisi9331 10 жыл бұрын
would you help me , i am running the same expermint , i followed the steps in the video , but most of the calssifiers are inactive (grey color) , thanx in advance
@jnscollier
@jnscollier 9 жыл бұрын
@27:37 If anyone is stuck by the SMO being grayed out, I *think* the solution is to go to the Preprocess tab, click Edit... and remember to first right-click the @@class@@ Nominal header and select "Attribute as class". I'm learning just like you so I could be wrong though!
@nebbynat638
@nebbynat638 9 жыл бұрын
This is a very , very good quick intro especially in filters. Weka has numerous data cleaning and filters and parameters and this was a good tutorial on the filters we use. You have an error in interpretation of False Positive and False N at 38:24. Look at page 164 Chapter 5 of the Third Edition of Witten, Frank and Hall. You seemed to have transposed the two. I am of the camp that the true value,"actual class" or ground truth is at the top of the confusion table, but the literature is replete with such inconsistencies. So beware of confusing the confusion matrix.
@nanvdand
@nanvdand 7 жыл бұрын
Really excellent. This is exactly what I was looking for as I start out on my first text classification project. So helpful and very much worth the hour.
@etaifour2
@etaifour2 7 жыл бұрын
this is a very good tutorial - great job at explaining things properly and slowly, thank you very much for the great work
@supandi100
@supandi100 11 жыл бұрын
I really appreciate the uploaded Weka Text Classification
@lpac8272
@lpac8272 4 жыл бұрын
Awesome tutorial. Thanks Brandon!
@adyutshirke2060
@adyutshirke2060 5 жыл бұрын
Thank you Sir, this was the most helpful video I've seen so far..
@jnscollier
@jnscollier 9 жыл бұрын
@10:08 When I set wordsToKeep from 1000 to 100 and click Apply, it doesn't update the model from 1000 to 100. Do I have to Undo everytime? Or if I run 1000 initially, why can I not then run 100 and have it update?
@devanshibhatt05
@devanshibhatt05 9 жыл бұрын
Thanks for sharing this Brandon! very useful video..very well explained.
@gauravjain8269
@gauravjain8269 5 жыл бұрын
Hi Bran, For Some reason I get the error when I execute the command.. fo converting text files in pos and neg folders contained within folder into .arff file Please tell Me Where Could I Be Wrong
@MartinClausen
@MartinClausen 11 жыл бұрын
Thanks for a great video. What is the significance of the text files containing one sentence for each line? Is this necessary and/or does this improve performance?
@mainulquraishi6922
@mainulquraishi6922 9 жыл бұрын
Thank you very much for the tutorial. Is it possible to use lemmatization in weka ?
@sandeepdommaraju
@sandeepdommaraju 9 жыл бұрын
thank you so much.. it is very helpful .. the best resource i could find for text classification :)
@mtopaz80
@mtopaz80 11 жыл бұрын
excellent tutorial, really helpful- thanks Brandon!!!
@adewolekayode6148
@adewolekayode6148 9 жыл бұрын
Thank you so much for this informative tutorial. Please I received this error when I was attempting to load my text file using TextDirectoryLoader. The error is "Problem settings base instance....", please what is the cause of this error?
@kleinkauff2
@kleinkauff2 8 жыл бұрын
Someone can help me? When i try to apply AttSelection weka gives me "Cannot handle numeric class". Ok, when i select some attribute i can see the "NUM" but i already applyd StringToWordVector..
@khadidjabakhti7036
@khadidjabakhti7036 8 жыл бұрын
hello ,i have the same problem. did you solved it ??
@drJohnPDX
@drJohnPDX 7 жыл бұрын
This is solved by clicking on 'Edit...' on the top, right clicking on the first column and selecting 'Attribute as class'. This puts the class at the last vs. first column which was causing the error. I find this video very hard to follow for the new user, he doesn't bother showing how to select the filters, etc. One has to hunt through them all to find what he's using. And then doesn't bothering to answer the questions he caused by his confusing video. For more experienced users this might be of value.
@bezsez8193
@bezsez8193 6 жыл бұрын
Thanks for your hints. If you know better tutorial, I would appreciate if you share the links.
@nenaBou
@nenaBou 8 жыл бұрын
Hello, would you help me, i need visualize the model but i got this error : can't print smo classifier ?
@sudip95
@sudip95 8 жыл бұрын
+Brandon Weinberg - the video is fantastic. It gives a brief idea about the sentiment analysis using weka. But I tried to do it myself, and I am getting error during the command execution. The command I tried is: java weka.core.converters.TextDirectoryLoader -dir C:\Users\Sudip P\Desktop\Weka files\Training > C:\Program Files\Weka-3-7\data\IMDB.arff The error I am getting are: weka.core.converters.TextDirectoryLoader.setSource(TextDirectoryLoader.java:398) weka.core.converters.TextDirectoryLoader.setDirectory(TextDirectoryLoader.java:367) weka.core.converters.TextDirectoryLoader.setOptions(TextDirectoryLoader.java:219) weka.core.converters.TextDirectoryLoader.main(TextDirectoryLoader.java:658) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199) at weka.core.converters.TextDirectoryLoader.setSource(TextDirectoryLoader.java:398) at weka.core.converters.TextDirectoryLoader.setDirectory(TextDirectoryLoader.java:367) at weka.core.converters.TextDirectoryLoader.setOptions(TextDirectoryLoader.java:219) at weka.core.converters.TextDirectoryLoader.main(TextDirectoryLoader.java:658) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199) I have tried with your dataset, but the problem still exists. I am using Windows Platform. Please help. Thanks in advance.
@kleinkauff2
@kleinkauff2 8 жыл бұрын
Hope its not too late to help you.. Try to make your folders without space between words..
@itsvanya
@itsvanya 8 жыл бұрын
Thank you so much! I was getting the same problem, but this solved it! :)
@MukulShukla1992
@MukulShukla1992 11 жыл бұрын
Hi Brandon Weinberg, I want to detect outliers from a dataset using weka, Please tell if weka is capable of generating the response in another file from my dataset.
@lezhitang1879
@lezhitang1879 3 жыл бұрын
Anybody have any idea how to find the directory for your folders when you are converting the text to arff for windows?
@sanjaymakwana6860
@sanjaymakwana6860 7 жыл бұрын
hi, i have to load text directory in weka , i have write a java weka.core.converts.TextDirectoryLoader dir and replace wit -dir "path" to my file directory path , but still not work ? can you provide a exact step for that ?
@slayers257
@slayers257 8 жыл бұрын
Hello, as I have access to the database IMDB. arff
@quochuyvu3067
@quochuyvu3067 8 жыл бұрын
This tutorial is sooo good
@child031
@child031 9 жыл бұрын
This is an informative video on text data mining, but it can be done with the WEKA knowledge flow. WEKA knowledge flow allows user to compare multiple models and it can be used for batch processing and instance processing.
@pramodzorze
@pramodzorze 9 жыл бұрын
Very nicely done!
@chentonglee6852
@chentonglee6852 10 жыл бұрын
Thanks for sharing, excellent informative video
@temesgenalemneh5801
@temesgenalemneh5801 8 жыл бұрын
It is very good video. But I couldn't import my excel data into WEKA software. It is more environmental data (physicochemical data of streams) and aquatic small organisms data. please give me a hint that how can I import the data into WEKA. I tried many time to change it into ARFF but still no data when I open file in WEKA. Thank you in advance !!!
@saikatblogsforfun
@saikatblogsforfun 9 жыл бұрын
This video is extremely helpful. Thank you . :)
@chinthakadissanayake1790
@chinthakadissanayake1790 4 жыл бұрын
thank you sir very clearly explained
@burhanrashidhussein6037
@burhanrashidhussein6037 7 жыл бұрын
great video thank you. You have easy my worries on text mining
@skirmish_7
@skirmish_7 9 жыл бұрын
hey...i want to convert product reviews txt format file into arff file... In your arff file attributes are--- ReviewText string sentiment {pos,neg} but in my arff file attributes are---- text string @@class@@{rd} i want them as---- ReviewText string class {subjective, objective} I am doing subjectivity and objectivity analysis on product reviews.....any help will be appreciated....
@joehahn21
@joehahn21 11 жыл бұрын
really appreciate the upload :)
@madmadan1992
@madmadan1992 11 жыл бұрын
where can i download the dataset.
@jamespruett27
@jamespruett27 11 жыл бұрын
would be nice if he answered a couple questions from the comments. Doesn't appear so... My question is "was WEKA used in the Netflix Challenge". Thanks for posting.
@hameddadgour
@hameddadgour 10 жыл бұрын
Amazing video! Thanks.
@jooyongshin8888
@jooyongshin8888 10 жыл бұрын
Where can I get arff file?
@YanDemey
@YanDemey 10 жыл бұрын
you can make one by yourself
@giuliofalcao12
@giuliofalcao12 6 жыл бұрын
How to classifier only 1 text?
@meghathakor9268
@meghathakor9268 8 жыл бұрын
Thank for this video. currently i am workin on this topic. i am unable to convert textdiirectory to .arff file. please reply me as soon as possible. i have the dataset from your mention link.
@kleinkauff2
@kleinkauff2 8 жыл бұрын
Hello! What is happening when you try to convert? I can try to help you
@vasanthnarayanan2396
@vasanthnarayanan2396 6 жыл бұрын
hello!!! please help me out. i'm in need of review text file as ARFF format. please help me to get it. its too urgent. please!!!
@yass356
@yass356 9 жыл бұрын
Great video, thanks a lot
@gu5on16
@gu5on16 7 жыл бұрын
Thank you for posting the video
@TubeAreej
@TubeAreej 11 жыл бұрын
Thank you very much, this is helpful.
@HiteshParmar
@HiteshParmar 11 жыл бұрын
Thank you so much , it really helped a lot. you are awesome.
@DoisKoh
@DoisKoh 9 жыл бұрын
This videoooo.... soooooo goood.
@BrothersCoffee
@BrothersCoffee 7 жыл бұрын
Thank you so much for this video!!!!!
@hwr44ever1
@hwr44ever1 8 жыл бұрын
Anyone here who has done Web Log mining, please i need help, i don't know anything about this tool, i tried to learn, but what i want isn't on the web. Anyone please can guide me how to preprocess Log files and find result according to the attribute i require.
@majidtata5131
@majidtata5131 10 жыл бұрын
to rana it s work even it s coulor grey ,it s active so try again
@cosmoto91
@cosmoto91 10 жыл бұрын
excelent tutorial. big thumbs up
@halka90
@halka90 9 жыл бұрын
Thanks very much
@alzami1986
@alzami1986 10 жыл бұрын
thanks for tutorial
@realmadridvideos
@realmadridvideos 11 жыл бұрын
THANKS !
@giuliofalcao12
@giuliofalcao12 6 жыл бұрын
Classfier 1 text only, and this classifier using "class value"
@monam9482
@monam9482 11 жыл бұрын
Thanks
@luizeduardogranadocardoso4024
@luizeduardogranadocardoso4024 7 жыл бұрын
Parabéns, BR.
@xelhuajosephcoronaperez3531
@xelhuajosephcoronaperez3531 9 жыл бұрын
ty!
@nileshthorat2403
@nileshthorat2403 3 жыл бұрын
Present sir
@mbmh123
@mbmh123 10 жыл бұрын
KSU student Was here :/
@oliveryoung6501
@oliveryoung6501 9 жыл бұрын
smo was not explained enough
@brucesmith2932
@brucesmith2932 7 жыл бұрын
This is a great video, thanks for demystifying this process!
@jacobatodo9138
@jacobatodo9138 9 жыл бұрын
how can I download these tutorial?
Weka Data Mining Tutorial for First Time & Beginner Users
23:08
Brandon Weinberg
Рет қаралды 659 М.
Machine Learning with Weka - regression and clustering
43:59
jengolbeck
Рет қаралды 50 М.
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН
How Strong Is Tape?
00:24
Stokes Twins
Рет қаралды 96 МЛН
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Interpreting Results and Accuracy in Weka
13:05
jengolbeck
Рет қаралды 45 М.
Getting Started with Weka - Machine Learning Recipes #10
9:24
Google for Developers
Рет қаралды 222 М.
Document Classification in Weka
8:16
jengolbeck
Рет қаралды 11 М.
Weka Tutorial 03: Classification 101 using Explorer (Classification)
14:58
Text Classification with Weka using a J48 Decision Tree
12:30
Data Mining with Weka (1.1: Introduction)
9:00
WekaMOOC
Рет қаралды 194 М.
Weka Tutorial 31: Document Classification 1 (Application)
12:04
Rushdi Shams
Рет қаралды 39 М.
Data Mining with Weka (1.4: Building a classifier)
9:01
WekaMOOC
Рет қаралды 142 М.
To Brawl AND BEYOND!
00:51
Brawl Stars
Рет қаралды 17 МЛН