How to Build Classification Models for the Penguins Dataset (Weka Tutorial #3)

  Рет қаралды 11,351

Data Professor

Data Professor

Күн бұрын

Пікірлер: 33
@DataProfessor
@DataProfessor 4 жыл бұрын
⭕ Check out other videos in this Weka tutorial series: ✅ How to Build Regression Models (Weka Tutorial #1) kzbin.info/www/bejne/qmbQqp99faplfMk ✅ How to Build Classification Models (Weka Tutorial #2) kzbin.info/www/bejne/n3rIapekq71reKc ⭕ Links for this video: ✅ Weka 3 website: www.cs.waikato.ac.nz/ml/weka/ ✅ Buy the Official Weka 3 Book: amzn.to/34MY6LC 👉Watch this video next (How to learn data science in 2021) kzbin.info/www/bejne/pYOZaGOKrdybbpo Support this Channel 👇👇👇 🌟 Buy me a coffee www.buymeacoffee.com/dataprofessor 🌟 Download Kite for FREE www.kite.com/get-kite/? 👉 Subscribe to this KZbin channel kzbin.info 👉 Join the Newsletter of Data Professor newsletter.dataprofessor.org
@Mario-ox5dm
@Mario-ox5dm 4 жыл бұрын
Loving the WEKA tutorials! Working on one of your Streamlit tutorials now as a refresher for me then I'm going to explore these!
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks Mario 😊
@KenJee_ds
@KenJee_ds 4 жыл бұрын
I love Penguins!!!
@DataProfessor
@DataProfessor 4 жыл бұрын
Haha, Go Penguins! 😆
@ioannischrysochos7737
@ioannischrysochos7737 Жыл бұрын
I used the tool ArffViewr and I upload the csv. When I saved the file as arff, the data were not clustered. But when I imported the file in the weka, I saw the data were clustered automatically. Thank you very much for your videos. They are very helpful.
@idanmorad4769
@idanmorad4769 4 жыл бұрын
Hey, thank for the video. A few remarks: 1. You can just let weka read the text file as CSV, weka then automatically adjust and then you can save it as ARFF file. 2. Can you talk on the set backs of weka? Such as you cannot split to train and test at the preprocess stage so every filter you make take into account the all the data, this open up a possibility of overfitting. 3. Weka also can be used in code, as its jars are in the maven repository. Maybe it can a future video.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks Idan for the pointers. Its been a long time since I last used Weka and I didn’t know csv could now be read in directly, thanks for that. Yes, Weka can be used in other languages such as R and Python as well as in the command line, e.g. can use java -jar weka.jar -i input.arff -o output.txt (correct me if I’m wrong, it’s something along this line). Yes I can make a future video about this. As for the data split we can do an initial split into separate files and use the 80 split for model building.
@idanmorad4769
@idanmorad4769 4 жыл бұрын
@@DataProfessor thanks for the reply :-) for the data I meant that you can import the java code of weka to a java project in NetBeans or InteliJ as you import package in python for a python project or R in R project. For the splitting of the sets, in python we have for sklearn transformers like SimpleImputer or TfidfVectorizer a method of fit which learns the data, and a method of transform which apply on the data. So you can learn the tf-idf values from the train and only apply them to test without learning the test. In weka, as far as I know, when you apply a filter in the preprocess stage it equivalent to fit_transform in sklearn. So if you want to impute missing value with the mean, the mean will be calculated from all the data and not just from the train segment. Or the tf-idf value will be calculated from all the data. Maybe I'm mistaken or you can show in weka how can you apply the same filter on train-test files and the results will only learn from the train file.
@alvynchin1160
@alvynchin1160 3 жыл бұрын
What do we do when our data set has numeric, categorical and nominal values and we are trying to predict a binary attribute. Do we remove the numerical data and categorical data to just make it nominal.
@amnar790
@amnar790 Жыл бұрын
If my data has imbalanced data in class attributes what I can do to get better performance?
@fahimhamdan1569
@fahimhamdan1569 4 жыл бұрын
By comparing brest_cancer & penguin dataset, equal class attribute will leads better model accuracy in all algo?
@DataProfessor
@DataProfessor 4 жыл бұрын
Class balancing would lead indirectly lead to better model accuracy by making sure that there will be less bias in the model when the classes are in equal number. Otherwise the class imbalance may lead to the majority class having more bias in the learning process. Also the class balancing would consequently lead to reliable results.
@fahimhamdan1569
@fahimhamdan1569 4 жыл бұрын
@@DataProfessor well explained 👍
@plica06
@plica06 Жыл бұрын
This is very helpful but I wish you would explain 1. Why you are selecting each algorithm? 2. Why do they perform differently? 3. When I make that Excel summary at the end what would I talk about that summary, what would I say?
@Sawaedo
@Sawaedo 4 жыл бұрын
Do you know how can I download the trained model? I don't see the option anywhere.
@DataProfessor
@DataProfessor 4 жыл бұрын
Hi Jose, unfortunately, Weka doesn't allow the export of trained model
@aliakbarjamali9472
@aliakbarjamali9472 2 жыл бұрын
Why do you usually convert CSV files into ARFF files while you can use CSV format in WEKA as well?
@DataProfessor
@DataProfessor 2 жыл бұрын
Good question, In the old days ARFF was the only supported native file format and over time CSV was added.
@aliakbarjamali9472
@aliakbarjamali9472 2 жыл бұрын
@@DataProfessor Fair enough. Thanks.
@edoedo8058
@edoedo8058 8 ай бұрын
Wonderful .. I learned too much. TnX :)))
@username42
@username42 4 жыл бұрын
what if you have a thousands of variables in data set , how to rename them to be used in weka ? it is easy for such small tabular data sets, but for scientific sensory data sets, there are hundreds or thousands of variables which needs to be renamed as numeric and latents :/
@DataProfessor
@DataProfessor 4 жыл бұрын
That's a great question. Then we'll need to write a Python script for that using a "for" loop to iterate through the list of variables.
@username42
@username42 4 жыл бұрын
@@DataProfessor yes and then we will continue on python for modelling , training and etc :D and leave weka behind
@DataProfessor
@DataProfessor 4 жыл бұрын
@@username42 Exactly! When the need arises to use Python, it's an early sign suggesting that we need to level up 😊
@MegaBoss1980
@MegaBoss1980 4 жыл бұрын
At 10.24 you used, training set as an option. Isn't it the whole set you imported? Also, in Excel, you have accuracy for training, CV, and test columns. I think training accuracy is on full data set. Which is the reason RF has 100% accuracy.
@DataProfessor
@DataProfessor 4 жыл бұрын
Yes, that is correct 😊
@sembutininverse
@sembutininverse 4 жыл бұрын
thank you.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks for watching!
@alhikmah_9
@alhikmah_9 4 жыл бұрын
Thank you professor for the amazing job you are doing. I appreciate your work. I would also like to know how you can be reached via email.
@DataProfessor
@DataProfessor 4 жыл бұрын
Thanks! It's hellodataprofessor@gmail.com
@alhikmah_9
@alhikmah_9 4 жыл бұрын
@@DataProfessor Thank you
How to Build Regression Models (Weka Tutorial #1)
19:09
Data Professor
Рет қаралды 104 М.
She made herself an ear of corn from his marmalade candies🌽🌽🌽
00:38
Valja & Maxim Family
Рет қаралды 18 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 31 МЛН
Weka Tutorial 03: Classification 101 using Explorer (Classification)
14:58
Weka Data Mining Tutorial for First Time & Beginner Users
23:08
Brandon Weinberg
Рет қаралды 659 М.
How to Perform Data Splitting (Weka Tutorial #5)
8:22
Data Professor
Рет қаралды 12 М.
Interpreting Results and Accuracy in Weka
13:05
jengolbeck
Рет қаралды 45 М.
Penguins Dataset as Alternative to Iris Dataset for Data Science
12:09
Data Professor
Рет қаралды 4,7 М.
How to Build Classification Models (Weka Tutorial #2)
16:47
Data Professor
Рет қаралды 43 М.
The Weka Explorer Interface
17:42
jengolbeck
Рет қаралды 9 М.
WEKA Tutorial #1.2 - How to Build a Data Mining Model from Scratch
13:43
Stanford CS229 I Machine Learning I Building Large Language Models (LLMs)
1:44:31