46 - Splitting data into training and testing sets for machine learning

  Рет қаралды 10,132

DigitalSreeni

DigitalSreeni

4 жыл бұрын

When you build a model using machine learning or other means it is important to validate it with a test data set. It is important to test the model on data that the algorithm did not use for training purposes. It is also important for the test data set to follow similar probability distribution as the training data set. The easiest way to achieve this is by splitting the data into training and testing data sets. This video tutorial explains this process in Python using Scikit-learn library.
The code from this video is available at: github.com/bnsreenu/python_fo...

Пікірлер: 15
@1UniverseGames
@1UniverseGames 3 жыл бұрын
Thank you very much sir , way too helpful:)
@DigitalSreeni
@DigitalSreeni 3 жыл бұрын
Most welcome!
@himanshu8006
@himanshu8006 3 жыл бұрын
thanks for the tutorial
@DigitalSreeni
@DigitalSreeni 3 жыл бұрын
Any time
@finnmccool8671
@finnmccool8671 4 жыл бұрын
Hi Sreenivas. Have you considered using Jupyter Lab for your tutorials? You could share the code on github.
@DigitalSreeni
@DigitalSreeni 4 жыл бұрын
I considered Jupyter notebook but stayed away to encourage new coders get comfortable with IDE. Jupyter is great and I use it for prototyping, but for real production type code development, even for scientific purposes, you need to get your hands dirty. May be I will add a video about Jupyter. Thanks for the suggestion.
@felip6180
@felip6180 4 жыл бұрын
Hey mr. Sreeni, I've noticed a thing here: when you create the instance ( by 7:00 minutes, reg in this video) and assing a specific data set to it, this instance receives all the regression parameters and results, so that when I need to wor with the data set, all I have to do is to use this instane. If I need to use another data set, I need to create another instance, such as reg_2, or overwrite the previous instance, is that it?
@DigitalSreeni
@DigitalSreeni 4 жыл бұрын
You can use same instance for multiple datasets. Think of creating an instance as defining an equation where you supply values to the equation later on.
@felip6180
@felip6180 4 жыл бұрын
@@DigitalSreeni Thank you, sir!
@159manusss
@159manusss 3 жыл бұрын
Thank you very very very very very much
@DigitalSreeni
@DigitalSreeni 3 жыл бұрын
You are welcome.
@RAZZKIRAN
@RAZZKIRAN 2 жыл бұрын
thank you ,
@DigitalSreeni
@DigitalSreeni 2 жыл бұрын
You are welcome!
@lorizoli
@lorizoli 2 жыл бұрын
Hello, I may be a little late to the party, but it seems to me that at 11:00 you are squaring the mean of the errors instead of calculating the mean of the squared errors.
@DigitalSreeni
@DigitalSreeni 2 жыл бұрын
Yes, looks like an extra set of brackets are missing. Thanks for pointing out, I will correct it on Github. Right now it says: print("Mean sq. errror between y_test and predicted =", np.mean(prediction_test-y_test)**2) It should be: print("Mean sq. errror between y_test and predicted =", np.mean((prediction_test-y_test)**2)) Or, it can be: ((prediction_test-y_test)**2).mean(axis=None)
47 - Multiple Linear Regression with SciKit-Learn in Python
13:18
DigitalSreeni
Рет қаралды 11 М.
45 - Linear regression using Sci-Kit Learn in Python
25:20
DigitalSreeni
Рет қаралды 8 М.
孩子多的烦恼?#火影忍者 #家庭 #佐助
00:31
火影忍者一家
Рет қаралды 41 МЛН
Haha😂 Power💪 #trending #funny #viral #shorts
00:18
Reaction Station TV
Рет қаралды 16 МЛН
50 - What is k-means clustering and how to code it in Python?
16:55
DigitalSreeni
Рет қаралды 12 М.
Training Data Vs Test Data Vs Validation Data| Krish Naik
14:41
Krish Naik Hindi
Рет қаралды 47 М.
36 - Introduction to Pandas - Data reading and handling
22:59
DigitalSreeni
Рет қаралды 5 М.
🍚🥵 Eat This Before “WORKOUT” !! #preworkoutmeal #youtubeshorts
0:38
Train, Test, & Validation Sets explained
6:58
deeplizard
Рет қаралды 201 М.
Урна с айфонами!
0:30
По ту сторону Гугла
Рет қаралды 8 МЛН
iPhone 16 с инновационным аккумулятором
0:45
ÉЖИ АКСЁНОВ
Рет қаралды 2,6 МЛН