Document Classification in Weka

Рет қаралды 11,643

jengolbeck

Күн бұрын

Пікірлер: 25

@brylie 6 жыл бұрын

Thanks for explaining this without much jargon. Your teaching style is friendly and accessible. Cheers 😀

@amine-us7hn 4 жыл бұрын

How did you create this arff file, I tried many times but did not

@hyunjungkang8378 6 жыл бұрын

Thanks a lot for the videos on Weka. I like the way you explain stuffs, they are very clear and easy to understand :)

@mjma1984 6 жыл бұрын

So i have a csv file with two columns, first is text and second is class. When i use the apply the filter, i don't see the list of words in my fist column, i simply don't see the attributes the way shown in your video @4:22 ! Any ideas how to get that ? When i click on the Edit, i see that Weka is treating each row in the first column as a whole word, meaning it doesn't split words in the sentences. I tried using stemmer, tokenizer, etc etc but i am still getting the saem ieeuse !

@jengolbeck 6 жыл бұрын

When you apply the StringToWord vector, what does it show?

@mjma1984 6 жыл бұрын

Nothing happens. I still see the window of Attributes with only the names of my attributes, on the selected attribute window, nothing changes !

@jengolbeck 6 жыл бұрын

If you want, drop me an email at jgolbeck@umd.edu with your file and I'd be happy to take a look

@martinl2603 6 жыл бұрын

Probably that your string data get converted into a nominal type (instead of a string type) when loading it into WEKA. StringToWordVector doesn't support nominal data type and does not work though.

@Marie-fu1db 5 жыл бұрын

I get this same issue. What is the solution?

@martinl2603 6 жыл бұрын

@jengolbeck, Well, as a member in WEKA community I can say that there are 2 main issues with your video: First: it is not bad way of using the "StringToWordVector" filter, but not absolutely the correct way, as this approach in the way you explained, brings some class information to the tokens, which provides clue to the machine learning algorithm later on about the class type and then provides an optimistic result such the one you had. Second: NaiveBayesMultinomialText can work with string data type, so the default StringToWordVector is not really necessary if you managed to use NaiveBayesMultinomialText classifier ;)

@JiminPark-ld2xx 3 жыл бұрын

Really appreciate if you can do a video about how to convert CSV files or .txt files into ARFF files. Is there any cleaning process before we convert it to ARFF or anything. Because a lot of students are suffering including myself due to this issue. Thank You...

@solomonngare8382 8 ай бұрын

Hello. What if the dataset is not labelled. it's just plain reviews with no label i.e. positive or negative. How do you go about labelling this

@jengolbeck 8 ай бұрын

In that scenario, it's not clear what you would be using Weka for. Weka allows you to build a model based on your data. If you don't have labels on the data, there is nothing to train a model from. From your comment mentioning "positive or negative", I feel like you might be interested in doing sentiment analysis? If that's the case, you would want to use an off-the-shelf sentiment analysis tool.

@matanonson4 3 жыл бұрын

can you share with us your trump.arff file please?

@yarjung5332 5 жыл бұрын

tooo much nice explanation love your way of teaching..

@aseelh8123 6 жыл бұрын

Thanks for your explanation, I have a data set of Arabic tweets, when i try to open it using WEKA a question marks appear!, is there a way for defining the Arabic language in WEKA ? Regards,

@TheRegent 5 жыл бұрын

Try to save the arff file using notepad as utf-8 format instead of ANSI. Then, sure it will read Arabic texts. Or use the CLI with updated package of languages fetched from java updates!