If I could give this video 10000 thumbs up I totally would. Brilliant work and great explanation of how all the different features actually work. You may have just saved me from failing a class. Thankyouthankyouthankyou.
@BrandonWeinberg11 жыл бұрын
Put this vid on KZbin today for ppl using machine learning and text classification for the first time in WEKA. My other vid was a general intro (23 min) and this is for text classification (59 min) 0:00 Introduction (5 minutes) 5:06 TextToDirectoryLoader (3 minutes) 8:12 StringToWordVector (19 minutes) 27:37 AttributeSelect (10 minutes) 37:37 Cost Sensitivity and Class Imbalance (8 minutes) 45:45 Classifiers (14 minutes) 59:07 Conclusion (20 seconds) Want to skip to a specific part? - Section 1 - 5:49 TextDirectoryLoader Command (1 minute) - Section 2 - 6:44 ARFF File Syntax (1 minute 30 seconds) 8:10 Vectorizing Documents (2 minutes) 10:15 WordsToKeep setting/Word Presence (1 minute 10 seconds) 11:26 OutputWordCount setting/Word Frequency (25 seconds) 11:51 DoNotOperateOnAPerClassBasis setting (40 seconds) 12:34 IDFTransform and TFTransform settings/TF-IDF score (1 minute 30 seconds) 14:09 NormalizeDocLength setting (1 minute 17 seconds) 15:46 Stemmer setting/Lemmatization (1 minute 10 seconds) 16:56 Stopwords setting/Custom Stopwords File (1 minute 54 seconds) 18:50 Tokenizer setting/NGram Tokenizer/Bigrams/Trigrams/Alphabetical Tokenizer (2 minutes 35 seconds) 21:25 MinTermFreq setting (20 seconds) 21:45 PeriodicPruning setting (40 seconds) 22:25 AttributeNamePrefix setting (16 seconds) 22:42 LowerCaseTokens setting (1 minute 2 seconds) 23:45 AttributeIndices setting (2 minutes 4 seconds) - Section 3 - 28:07 AttributeSelect for reducing dataset to improve classifier performance/InfoGainEval evaluator/Ranker search (7 minutes) - Section 4 - 38:32 CostSensitiveClassifer/Adding cost effectiveness to base classifier (2 minutes 20 seconds) 42:17 Resample filter/Example of undersampling majority class (1 minute 10 seconds) 43:27 SMOTE filter/Example of oversampling the minority class (1 minute) - Section 5 - 45:34 Training vs. Testing Datasets (1 minute 32 seconds) 47:07 Naive Bayes Classifier (1 minute 57 seconds) 49:04 Multinomial Naive Bayes Classifier (10 seconds) 49:33 K Nearest Neighbor Classifier (1 minute 34 seconds) 51:17 J48 (Decision Tree) Classifier (2 minutes 32 seconds) 53:50 Random Forest Classifier (1 minute 39 seconds) 55:55 SMO (Support Vector Machine) Classifier (1 minute 38 seconds) 57:35 Supervised vs Semi-Supervised vs Unsupervised Learning/Clustering (1 minute 20 seconds) Since all text data is turned into numbers and categories after sections 1-2, most of sections 3-5 are useful in both text classification and other data analysis in WEKA. Classifiers introduces you to six (but not all) of WEKA's popular classifiers for text mining; 1) Naive Bayes, 2) Multinomial Naive Bayes, 3) K Nearest Neighbor, 4) J48, 5) Random Forest and 6) SMO. Each StringToWordVector setting is shown, e.g. tokenizer, outputWordCounts, normalizeDocLength, TF-IDF, stopwords, stemmer, etc. These are ways of representing documents as document vectors. Automatically converting 2,000 text files (plain text documents) into an ARFF file with TextDirectoryLoader is shown. Additionally shown is AttributeSelect which is a way of improving classifier performance by reducing the dataset. Cost-Sensitive Classifier is shown which is a way of assigning weights to different types of guesses. Resample and SMOTE are shown as ways of undersampling the majority class and oversampling the majority class. Introductory tips are shared throughout, e.g. distinguishing supervised learning (which is most of data mining) from semi-supervised and unsupervised learning, making identically-formatted training and testing datasets, how to easily subset outliers with the Visualize tab and more...
@ranaalqaisi933110 жыл бұрын
would you help me , i am running the same expermint , i followed the steps in the video , but most of the calssifiers are inactive (grey color) , thanx in advance
@cornchips00710 жыл бұрын
Rana Alqaisi your data set is not ready for use. Prepare it properly
@ranaalqaisi933110 жыл бұрын
Thank you , it solved :)
@meghathakor92688 жыл бұрын
+Rana Alqaisi i am running the same experiment but unable to load text directory to .arff format. what should i do? please reply me as soon as possible.
@ranaalqaisi93318 жыл бұрын
You need to have a csv or arff fie to run the file in weka, if its text file you have to see the structure of arff file and convert it ! for any help let me know !
@khyativyas89567 жыл бұрын
This is the one of the besttt videos I have seen! Thank you so much!! It's crazy how powerful WEKA is!
@ali_z19808 жыл бұрын
I really appreciate you uploading this video. You clarified several questions I had that my professor struggled to explain to his class.
@xavieryang12552 жыл бұрын
so much better and useful than Weka's official tutorial
@payalbaruah11 жыл бұрын
Thank you soo much! This is the MOST Informative and helpful video in WEKA!! Keep up the good work Brandon! We are learning so much from your videos.
@ranaalqaisi933110 жыл бұрын
would you help me , i am running the same expermint , i followed the steps in the video , but most of the calssifiers are inactive (grey color) , thanx in advance
@jnscollier9 жыл бұрын
@27:37 If anyone is stuck by the SMO being grayed out, I *think* the solution is to go to the Preprocess tab, click Edit... and remember to first right-click the @@class@@ Nominal header and select "Attribute as class". I'm learning just like you so I could be wrong though!
@nebbynat6389 жыл бұрын
This is a very , very good quick intro especially in filters. Weka has numerous data cleaning and filters and parameters and this was a good tutorial on the filters we use. You have an error in interpretation of False Positive and False N at 38:24. Look at page 164 Chapter 5 of the Third Edition of Witten, Frank and Hall. You seemed to have transposed the two. I am of the camp that the true value,"actual class" or ground truth is at the top of the confusion table, but the literature is replete with such inconsistencies. So beware of confusing the confusion matrix.
@nanvdand7 жыл бұрын
Really excellent. This is exactly what I was looking for as I start out on my first text classification project. So helpful and very much worth the hour.
@etaifour27 жыл бұрын
this is a very good tutorial - great job at explaining things properly and slowly, thank you very much for the great work
@supandi10011 жыл бұрын
I really appreciate the uploaded Weka Text Classification
@lpac82724 жыл бұрын
Awesome tutorial. Thanks Brandon!
@adyutshirke20605 жыл бұрын
Thank you Sir, this was the most helpful video I've seen so far..
@jnscollier9 жыл бұрын
@10:08 When I set wordsToKeep from 1000 to 100 and click Apply, it doesn't update the model from 1000 to 100. Do I have to Undo everytime? Or if I run 1000 initially, why can I not then run 100 and have it update?
@devanshibhatt059 жыл бұрын
Thanks for sharing this Brandon! very useful video..very well explained.
@gauravjain82695 жыл бұрын
Hi Bran, For Some reason I get the error when I execute the command.. fo converting text files in pos and neg folders contained within folder into .arff file Please tell Me Where Could I Be Wrong
@MartinClausen11 жыл бұрын
Thanks for a great video. What is the significance of the text files containing one sentence for each line? Is this necessary and/or does this improve performance?
@mainulquraishi69229 жыл бұрын
Thank you very much for the tutorial. Is it possible to use lemmatization in weka ?
@sandeepdommaraju9 жыл бұрын
thank you so much.. it is very helpful .. the best resource i could find for text classification :)
Thank you so much for this informative tutorial. Please I received this error when I was attempting to load my text file using TextDirectoryLoader. The error is "Problem settings base instance....", please what is the cause of this error?
@kleinkauff28 жыл бұрын
Someone can help me? When i try to apply AttSelection weka gives me "Cannot handle numeric class". Ok, when i select some attribute i can see the "NUM" but i already applyd StringToWordVector..
@khadidjabakhti70368 жыл бұрын
hello ,i have the same problem. did you solved it ??
@drJohnPDX7 жыл бұрын
This is solved by clicking on 'Edit...' on the top, right clicking on the first column and selecting 'Attribute as class'. This puts the class at the last vs. first column which was causing the error. I find this video very hard to follow for the new user, he doesn't bother showing how to select the filters, etc. One has to hunt through them all to find what he's using. And then doesn't bothering to answer the questions he caused by his confusing video. For more experienced users this might be of value.
@bezsez81936 жыл бұрын
Thanks for your hints. If you know better tutorial, I would appreciate if you share the links.
@nenaBou8 жыл бұрын
Hello, would you help me, i need visualize the model but i got this error : can't print smo classifier ?
@sudip958 жыл бұрын
+Brandon Weinberg - the video is fantastic. It gives a brief idea about the sentiment analysis using weka. But I tried to do it myself, and I am getting error during the command execution. The command I tried is: java weka.core.converters.TextDirectoryLoader -dir C:\Users\Sudip P\Desktop\Weka files\Training > C:\Program Files\Weka-3-7\data\IMDB.arff The error I am getting are: weka.core.converters.TextDirectoryLoader.setSource(TextDirectoryLoader.java:398) weka.core.converters.TextDirectoryLoader.setDirectory(TextDirectoryLoader.java:367) weka.core.converters.TextDirectoryLoader.setOptions(TextDirectoryLoader.java:219) weka.core.converters.TextDirectoryLoader.main(TextDirectoryLoader.java:658) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) java.lang.reflect.Method.invoke(Unknown Source) weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199) at weka.core.converters.TextDirectoryLoader.setSource(TextDirectoryLoader.java:398) at weka.core.converters.TextDirectoryLoader.setDirectory(TextDirectoryLoader.java:367) at weka.core.converters.TextDirectoryLoader.setOptions(TextDirectoryLoader.java:219) at weka.core.converters.TextDirectoryLoader.main(TextDirectoryLoader.java:658) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:199) I have tried with your dataset, but the problem still exists. I am using Windows Platform. Please help. Thanks in advance.
@kleinkauff28 жыл бұрын
Hope its not too late to help you.. Try to make your folders without space between words..
@itsvanya7 жыл бұрын
Thank you so much! I was getting the same problem, but this solved it! :)
@MukulShukla199211 жыл бұрын
Hi Brandon Weinberg, I want to detect outliers from a dataset using weka, Please tell if weka is capable of generating the response in another file from my dataset.
@lezhitang18793 жыл бұрын
Anybody have any idea how to find the directory for your folders when you are converting the text to arff for windows?
@sanjaymakwana68607 жыл бұрын
hi, i have to load text directory in weka , i have write a java weka.core.converts.TextDirectoryLoader dir and replace wit -dir "path" to my file directory path , but still not work ? can you provide a exact step for that ?
@slayers2578 жыл бұрын
Hello, as I have access to the database IMDB. arff
@quochuyvu30678 жыл бұрын
This tutorial is sooo good
@child0319 жыл бұрын
This is an informative video on text data mining, but it can be done with the WEKA knowledge flow. WEKA knowledge flow allows user to compare multiple models and it can be used for batch processing and instance processing.
@pramodzorze9 жыл бұрын
Very nicely done!
@chentonglee685210 жыл бұрын
Thanks for sharing, excellent informative video
@temesgenalemneh58018 жыл бұрын
It is very good video. But I couldn't import my excel data into WEKA software. It is more environmental data (physicochemical data of streams) and aquatic small organisms data. please give me a hint that how can I import the data into WEKA. I tried many time to change it into ARFF but still no data when I open file in WEKA. Thank you in advance !!!
@saikatblogsforfun9 жыл бұрын
This video is extremely helpful. Thank you . :)
@chinthakadissanayake17904 жыл бұрын
thank you sir very clearly explained
@burhanrashidhussein60377 жыл бұрын
great video thank you. You have easy my worries on text mining
@skirmish_79 жыл бұрын
hey...i want to convert product reviews txt format file into arff file... In your arff file attributes are--- ReviewText string sentiment {pos,neg} but in my arff file attributes are---- text string @@class@@{rd} i want them as---- ReviewText string class {subjective, objective} I am doing subjectivity and objectivity analysis on product reviews.....any help will be appreciated....
@joehahn2111 жыл бұрын
really appreciate the upload :)
@madmadan199211 жыл бұрын
where can i download the dataset.
@jamespruett2711 жыл бұрын
would be nice if he answered a couple questions from the comments. Doesn't appear so... My question is "was WEKA used in the Netflix Challenge". Thanks for posting.
@hameddadgour10 жыл бұрын
Amazing video! Thanks.
@jooyongshin888810 жыл бұрын
Where can I get arff file?
@YanDemey10 жыл бұрын
you can make one by yourself
@giuliofalcao126 жыл бұрын
How to classifier only 1 text?
@meghathakor92688 жыл бұрын
Thank for this video. currently i am workin on this topic. i am unable to convert textdiirectory to .arff file. please reply me as soon as possible. i have the dataset from your mention link.
@kleinkauff28 жыл бұрын
Hello! What is happening when you try to convert? I can try to help you
@vasanthnarayanan23966 жыл бұрын
hello!!! please help me out. i'm in need of review text file as ARFF format. please help me to get it. its too urgent. please!!!
@yass3569 жыл бұрын
Great video, thanks a lot
@gu5on167 жыл бұрын
Thank you for posting the video
@TubeAreej11 жыл бұрын
Thank you very much, this is helpful.
@HiteshParmar11 жыл бұрын
Thank you so much , it really helped a lot. you are awesome.
@DoisKoh9 жыл бұрын
This videoooo.... soooooo goood.
@BrothersCoffee7 жыл бұрын
Thank you so much for this video!!!!!
@hwr44ever18 жыл бұрын
Anyone here who has done Web Log mining, please i need help, i don't know anything about this tool, i tried to learn, but what i want isn't on the web. Anyone please can guide me how to preprocess Log files and find result according to the attribute i require.
@majidtata513110 жыл бұрын
to rana it s work even it s coulor grey ,it s active so try again
@cosmoto9110 жыл бұрын
excelent tutorial. big thumbs up
@halka909 жыл бұрын
Thanks very much
@alzami198610 жыл бұрын
thanks for tutorial
@realmadridvideos11 жыл бұрын
THANKS !
@giuliofalcao126 жыл бұрын
Classfier 1 text only, and this classifier using "class value"
@monam948211 жыл бұрын
Thanks
@luizeduardogranadocardoso40247 жыл бұрын
Parabéns, BR.
@xelhuajosephcoronaperez35319 жыл бұрын
ty!
@nileshthorat24033 жыл бұрын
Present sir
@mbmh12310 жыл бұрын
KSU student Was here :/
@oliveryoung65019 жыл бұрын
smo was not explained enough
@brucesmith29327 жыл бұрын
This is a great video, thanks for demystifying this process!