46 - Splitting data into training and testing sets for machine learning

Рет қаралды 10,132

4 жыл бұрын

When you build a model using machine learning or other means it is important to validate it with a test data set. It is important to test the model on data that the algorithm did not use for training purposes. It is also important for the test data set to follow similar probability distribution as the training data set. The easiest way to achieve this is by splitting the data into training and testing data sets. This video tutorial explains this process in Python using Scikit-learn library.
The code from this video is available at: github.com/bnsreenu/python_fo...

Пікірлер: 15

@1UniverseGames 3 жыл бұрын

Thank you very much sir , way too helpful:)

@DigitalSreeni 3 жыл бұрын

Most welcome!

@himanshu8006 3 жыл бұрын

thanks for the tutorial

@DigitalSreeni 3 жыл бұрын

Any time

@finnmccool8671 4 жыл бұрын

Hi Sreenivas. Have you considered using Jupyter Lab for your tutorials? You could share the code on github.

@DigitalSreeni 4 жыл бұрын

I considered Jupyter notebook but stayed away to encourage new coders get comfortable with IDE. Jupyter is great and I use it for prototyping, but for real production type code development, even for scientific purposes, you need to get your hands dirty. May be I will add a video about Jupyter. Thanks for the suggestion.

@felip6180 4 жыл бұрын

Hey mr. Sreeni, I've noticed a thing here: when you create the instance ( by 7:00 minutes, reg in this video) and assing a specific data set to it, this instance receives all the regression parameters and results, so that when I need to wor with the data set, all I have to do is to use this instane. If I need to use another data set, I need to create another instance, such as reg_2, or overwrite the previous instance, is that it?

@DigitalSreeni 4 жыл бұрын

You can use same instance for multiple datasets. Think of creating an instance as defining an equation where you supply values to the equation later on.

@felip6180 4 жыл бұрын

@@DigitalSreeni Thank you, sir!

@159manusss 3 жыл бұрын

Thank you very very very very very much

@DigitalSreeni 3 жыл бұрын

You are welcome.

@RAZZKIRAN 2 жыл бұрын

thank you ,

@DigitalSreeni 2 жыл бұрын

You are welcome!

@lorizoli 2 жыл бұрын

Hello, I may be a little late to the party, but it seems to me that at 11:00 you are squaring the mean of the errors instead of calculating the mean of the squared errors.

@DigitalSreeni 2 жыл бұрын

Yes, looks like an extra set of brackets are missing. Thanks for pointing out, I will correct it on Github. Right now it says: print("Mean sq. errror between y_test and predicted =", np.mean(prediction_test-y_test)**2) It should be: print("Mean sq. errror between y_test and predicted =", np.mean((prediction_test-y_test)**2)) Or, it can be: ((prediction_test-y_test)**2).mean(axis=None)