Train, Test, & Validation Sets explained

Рет қаралды 207,421

Күн бұрын

Пікірлер: 185

@deeplizard 6 жыл бұрын

Machine Learning / Deep Learning Tutorials for Programmers playlist: kzbin.info/aero/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU Keras Machine Learning / Deep Learning Tutorial playlist: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL

@KhaledSaleh-k9n 16 күн бұрын

Easy, clear, and complete. Perfect explanation ! 🌹

@messapatingy 6 жыл бұрын

I love how concise (waffle free) your videos are.

@deeplizard 6 жыл бұрын

Thank you, Andre!

@mohamedlichouri5324 6 жыл бұрын

I have finally understood the difference between the validation and test sets as well as the importance of the validation set. Thanks for the clear and sample explanation.

@adanegebretsadik8390 5 жыл бұрын

Mr. mohamed could you please tell me the importance of validation set and how to prepare because i don't understand it well? thank you

@mohamedlichouri5324 5 жыл бұрын

@@adanegebretsadik8390 Lets consider that you have a dataset D which we will split as follow: 1- 70% Of D as train set = T' 2- 30% of D as test set = S We will further split the train set T' as follow: 1- 70% of T as train set = T 2- 30% of T as valid set = V To construct a good classifier model we need it to learn all the important information related to T and then validate it first on D than test it in final on S. The perfect model will have a good score in both D and T. As a simple explanation to this setup would be this: If you are learning a new course (Machine Learning) D, you will have to pass some labs (V). If you have scored a good score in V you are eligible to pass the final test S with confidence. Otherwise you will have to re-learn the course material D and test yourself a second time on V until you achieve a good results in V.

@adanegebretsadik8390 5 жыл бұрын

@@mohamedlichouri5324 thank you so much bro i finally understood but one which is not clear for me is how to split the V from train in keras python? again thank you

@mohamedlichouri5324 5 жыл бұрын

@@adanegebretsadik8390 I often use train_test_split function like this: from sklearn.model_selection import train_test_split 1- Split the data to 70% T' and 30% Test S. X_trn, X_test, y_trn, y_test = train_test_split(X, y, test_size=0.30) 2- Resplit T' to 70% Train T and 30% Valid V. X_train, X_valid, y_train, y_valid = train_test_split(X_trn, y_trn, test_size=0.30)

@adanegebretsadik8390 5 жыл бұрын

thank your sir. but i want to contact you to share your deep knowledge about machine learning since all the tips that i get from you are very essential for me till now. so do you mind if i contact you by social medias? for general information i am masters student in computer engineering so it may help me do my thesis.

@dwosllk 5 жыл бұрын

You said that “test set is unlabeled” but actually it is a labeled dataset. Of course it could be unlabeled because it isn’t adding anything to the model while it is training, but we use a labeled test set to quickly determine our models performance when it has finished training.

@alfianabdulhalin1873 4 жыл бұрын

Hi @Gábor Pelesz ... That's what I thought too. I was wondering if I could get your insights on the main difference between validation and test set. From what I understand, the validation set is used with training. Meaning, after training say a Logistic Regression model (~100,000 iterations with specific hyperparamters)... then we deploy the validation set on this trained model... after which some metrics are calculated. If the error is bad, then we tune the hyperparamters, or do whatever is necessary... and then train some more based on the changes. Then after that... we validate again using the validation data... and this goes on until the get a satisfactory error chart. Wouldn't the TEST set now be redundant? Since we already achieved good performance on the validation set. From what I've self-learned, we basically sample all sets (training, validation and test), from basically the same distribution... right? Would appreciate any insights.

@dwosllk 4 жыл бұрын

@@alfianabdulhalin1873 1. In an ideal world, where we can train with data that is completely cover the space of the variables, the test set might be useless because it isn't adding any information to us (i.e redundant, was already in the training set). Therefore our models performance would be exactly what it achieved while training. But sadly we are so far from this world that with additional test sets, we are only able to speculate the performance of our models. So summing up, training and validation set is, let's say, 80%. The 20% that's left is more likely (and also it is important) to be unique and different. 2. We are training with the training set so our model is most biased towards the training set. Let's assume the model is tested against the validation set after the model went through all our data once and want to start over for another iteration (i.e if we have 10 training samples, then after every 10th step we test against validation set). While validating, we modify some hyperparameters accordingly (e.g. learning rate). What's important is that we change things after seeing how our validation tests performed, thus our model is also biased towards the validation set (although not as much as towards the training set). This emphasizes the relevance of a test set, a set of datapoints that the model probably never seen before (also the test set is important to be unique, different than the others, to make sense). Hope your questions are answered!

@tamoorkhan3262 4 жыл бұрын

Yes, the reason we pass labels with test data is to determine the accuracy, otherwise, those labels play no other role. It is like, you pass your unlabelled test data through the model collect all the predictions and then using the correct labels to compute the accuracy.

@aroonsubway2079 3 жыл бұрын

@@tamoorkhan3262 Do you mean test dataset is another validation set? After all, they are the same in the sense that their labels will not be used to update model parameter, and their labels are only used to generate some accuracy numbers.

@mikeguitar-michelerossi 2 жыл бұрын

@@aroonsubway2079 To the best of my knowledge the main point in distringuishing between validation set and test set is the following. During the training phase, we want to maximize the performance (accuracy) calculated on the validation set. By doing this after a while we are adjusting hyperparameters (n' of neurons, activation functions, n' of epochs...) to perform well in "that particular" validation set! (That's why cross-validation is generally a good choce) The test set should be considered "one shot". We do not generally adjust hyperparameters to have a better performance on test set, because that was the role of the validation set. (Also the test set is labelled) It's an approximation but in general: 👉 train set -> to adjust weigths of our model 👉 valid set -> to adjust hyperparmaters 👉 test set -> calculate final accuracy

@sagar11222 4 жыл бұрын

Because of you, i am learning ANN during corona lockdown. Thank you very much.

@zenchiassassin283 4 жыл бұрын

And me trying to recall as fast as possible Xd

@hmmoniruzzaman8537 5 жыл бұрын

Really helpful. Finally understood the difference between Validation and Test set.

@ulysses_grant 4 жыл бұрын

Ur videos are neat. I have even to pause them and digest all the information before moving on sometimes. Thanks for your work.

@nahomghebremichael1956 5 жыл бұрын

I am really appreciate how simply you explained the concept. Your videos really help me to get the basic concept of DNN

@mikeguitar-michelerossi 2 жыл бұрын

To the best of my knowledge the main point in distringuishing between validation set and test set is the following. During the training phase, we want to maximize the performance (accuracy) calculated on the validation set. By doing this after a while we are adjusting hyperparameters (n' of neurons, activation functions, n' of epochs...) to perform well in "that particular" validation set! (That's why cross-validation is generally a good choce) The test set should be considered "one shot". We do not generally adjust hyperparameters to have a better performance on test set, because that was the role of the validation set. (Also the test set is labelled) It's an approximation but in genral: 👉 train set -> to adjust weigths of our model 👉 valid set -> to adjust hyperparmaters 👉 test set -> calculate final accuracy

@gaborpajor3459 Жыл бұрын

well done; straightforward and clear; thanks a lot

@hiroshiperera7107 6 жыл бұрын

Best video series so far found which explains the concepts of Neural networks :)

@deeplizard 6 жыл бұрын

Thank you, Hiroshi!

@atakanbilgili4373 2 жыл бұрын

Very clearly explained, thanks.

@ivomitdiamonds1901 5 жыл бұрын

Perfect rate at which you speak. Perfect.

@TheBriza123 5 жыл бұрын

Thanks a lot for these videos.. I was trying using CNN and Keras without explanation and I was just lost - now I get it.. Thx again angel

@patchyst7577 5 жыл бұрын

Very helpful, precise definition. I appreciate it :)

@MurodilDosmatov 2 жыл бұрын

Thank you very much. I understood everything litteraly. Big thanks

@sunainamukherjee7648 2 жыл бұрын

Loved all the videos and extremely clear with the concepts and the foundations of ML, often we run models but don't have in depth understanding of what exactly it is. Your explanation is by far the best across all videos I have seen. I can actually go ahead and explain the concepts to others with full clarity. Thank you so much for your efforts. One request, I think there is one concept that got missed, " regularizers ". It will be nice to have a short video on that too. Thanks again for your precious time and super awesome explanation. Looking forward to being an expert like you :)

@deeplizard 2 жыл бұрын

Thanks, sunaina! Happy to hear how much you enjoyed the course :) Btw, regularization is covered here: kzbin.info/www/bejne/n6atmKyfiJx1ga8

@analuciademoraislimalucial6039 3 жыл бұрын

Thanks Teacher!!! Gratefull

@wolfrinn2538 3 жыл бұрын

well i was reading deep learning with python and i got a bit lost, this video explained it to me very well so thank you and keep the hard work

@thespam8385 5 жыл бұрын

{ "question": "The test set differs from the train and validation sets by:", "choices": [ "Being applied after training and being unlabeled", "Being applied after training and being labeled", "Being randomly selected data", "Being hand-selected data" ], "answer": "Being applied after training and being unlabeled", "creator": "Chris", "creationDate": "2019-12-11T04:29:35.828Z" }

@deeplizard 5 жыл бұрын

Thanks, Chris! Just added your question to deeplizard.com

@qusayhamad7243 4 жыл бұрын

thank you very much for this clear and helpful explanation.

@tamoorkhan3262 4 жыл бұрын

Loving this series. Concise and to the point. (Y)

@muhammadsaleh729 4 жыл бұрын

Thank you for your clear explanation

@actechforlife 6 жыл бұрын

comments = "Thank you for your videos"

@deeplizard 6 жыл бұрын

response = "You're welcome, Ra-ki Papa!"

@anshulprajapati7833 4 жыл бұрын

@@deeplizard Perfect💖

@tymothylim6550 3 жыл бұрын

Thank you very much for this video! It helps me get a quick understanding of the use of these 3 separate datasets (i.e. Train, Test and Validation)!

@annarauscher8536 3 жыл бұрын

Thank you so much for your videos! You make machine learning so much more understandable and fun :) I really appreciate your passion! Keep it up!!!

@ismailhadjir9703 4 жыл бұрын

thank you for this interesting video

@radouanebouchou7488 4 жыл бұрын

Quality content , thanks

@tomatosauce9561 6 жыл бұрын

Such a great video!!!!! Thank you!!!!!!

@hiroshiperera7107 6 жыл бұрын

Best video series so far found which explains the concepts of Neural networks :) ... One small suggestion.. better if the font size of 'Jupiter Note book' is bit bigger. So it will be more easier to check the codes :)

@deeplizard 6 жыл бұрын

Thanks for the suggestion, Hiroshi! In later videos that show code, I've started zooming in to maximize the individual code cells I'm covering. As an example, you can see the code starting at 7:33 in this video: kzbin.info/www/bejne/kJuwkIuHlpqmbNUm33s Let me know what you think of this technique.

@kajalnadhawadekar3326 5 жыл бұрын

@Deeplizard u explained very well...

@mauriciourtadoilha9971 5 жыл бұрын

I believe you're wrong about the test set being unlabeled. As far as I remember from Andrew Ng course in Stanford, the training set is used for model tuning for multiple models; the validation set is used for model selection (this is where you compare different models to check which one best performs on data not used for training). Once you choose a definitive model, you still have to check if it generalizes well for data never seen before, that does not carry any bias on model selection. At this point, you don't do any further tuning. Besides, having a labeled test set allows you to define test error. If data are unlabeled, this term doesn't make any sense, does it?

@deeplizard 5 жыл бұрын

The test set's labels just cannot be known to the model in the way that the train and validation sets are. So as far as the model knows, the test set is unlabeled. You may have the test labels stored elsewhere though to do your own analysis.

@maxmacken8859 2 жыл бұрын

Fantastic video, one aspect I am confused about is what is the algorithm doing when it is 'training' the data? How does it train on data and how do we know it is correct? Do you have any videos on this question or know where I could look to understand? Thank you.

@deeplizard 2 жыл бұрын

Yes, check out the Training and Learning lessons in the course: deeplizard.com/learn/playlist/PLZbbT5o_s2xq7LwI2y8_QtvuXZedL6tQU

@sahilvishwakarma6509 2 жыл бұрын

Check 3blue1brown's Neural Network videos

@zenchiassassin283 4 жыл бұрын

Hello, do you have videos explaining different types of activation functions, when to use a specific one? And do you have a video about optimizers ? Like Momentum

@deeplizard 4 жыл бұрын

This episode explains activation functions: deeplizard.com/learn/video/m0pIlLfpXWE No specific episode on optimizers, although we do have several explaining how a NN learns via SGD optimization.

@justchill99902 6 жыл бұрын

Really cleared my mind! Thank you :) Keep up the good work.

@justchill99902 6 жыл бұрын

I have one question. In Tensorflow's Object detection API they tell us to create a training directory and a test directory and as usual 90-10 distribution. But we gotta label all of them. So this means the test directory in case of Tensorflow's API is actually Validation set right?

@deeplizard 6 жыл бұрын

Hey Nirbhay - Not necessarily. Sometimes we'll label our test sets so we can see the stats from how well the model predicted on the test data. For example, we may want to plot a confusion matrix with the results from the test set. More on this here: kzbin.info/www/bejne/oZ6aoauBrpmIfrc If the test set is labeled, we just have to take extra precaution to make sure that the labels are not made available to the model, like they are for training and validations sets.

@justchill99902 6 жыл бұрын

Ok I have to dig up more in order to understand it. By the way the page at the weblink you sent ain't available. Could you please post it again? or perhaps the title of the video? Thanks :)

@deeplizard 6 жыл бұрын

The ")" was caught on the end of the URL. Here's the link: kzbin.info/www/bejne/oZ6aoauBrpmIfrc

@11MyName111 4 жыл бұрын

One question: At 1:40 you said weights won't be updated based on validation loss. If so, how does validation set help us? Since we are not using it to update the model... Later at 1:57 you said it's used so model doesn't overfit. How? When does it come into play? Goes without saying, great video! I'm on a spree!

@deeplizard 4 жыл бұрын

Hey Ivan - We, as trainers of the model, can use the validation set metrics as a monitor to tell whether or not the model is overfitting. If it is, we can make appropriate adjustments to the model. The model itself though is not using the validation set for learning purposes or weight updates.

@11MyName111 4 жыл бұрын

@@deeplizard we adjust the model manually! Ok, it makes sense now :) Thank you!

@draganatosic9638 6 жыл бұрын

Thank you for the video! Super concise and clear. If you could shortly mention some real world examples in the future videos, that would be great, I see in the comments that people have been wondering about similar things as I have. Or maybe you have done that, I'm about to check the other videos as well :)

@deeplizard 6 жыл бұрын

You're welcome, Dragana! And thank you! Yes, as the playlist progresses, I do introduce some examples. More hands-on examples (with code) are shown in the Keras and TensorFlow.js series below. Those series more so focus on how to implement the fundamental concepts we cover in this series. I hope you enjoy the next videos you check out! Keras: kzbin.info/aero/PLZbbT5o_s2xrwRnXk_yCPtnqqo4_u2YGL TensorFlow.js: kzbin.info/aero/PLZbbT5o_s2xr83l8w44N_g3pygvajLrJ-

@carlosmontesparra8548 3 жыл бұрын

thanks very much for the videos!! then the labels of the training set and the array of its labels should be ordered to match properly, correct? f.i. if index 0 of the training array is the picture of a cat, then index 0 of label array should be 0 (and 1 if a dog)?

@deeplizard 3 жыл бұрын

Yes, correct!

@ranasagar1201 4 жыл бұрын

Mam can just tell me do u have NLP playlist netural language processing

@deeplizard 4 жыл бұрын

Not yet

@alfianabdulhalin1873 5 жыл бұрын

One question. Data in the tet set does have labels, right? But it's not known to the classifier... It's only labeled so that we could calculate all the metrics at the end more easily... right?

@deeplizard 5 жыл бұрын

Sometimes we'll have labels for the test set, and other times we may not. When we do have the labels, you're correct that the network will not be aware of the labels that correspond to the samples. The network will understand that there are, say 10 different classes, for which it will need to classify the data from the test set, but it will not know the individual labels that correspond to each sample.

@dazzaondmic 6 жыл бұрын

Thank you for the video. I just have one question. How do we know how well our model is performing on the test set if we don't have labels to tell us the "correct answer" and if we don't even know what the correct answer is ourselves. How do we then know that the model performed well or badly on the test set? Thanks again.

@deeplizard 6 жыл бұрын

Hey dazzaondmic - You're welcome! If we don't know the labels to the test set ourselves, then the only way we can gauge the performance of the model is based on the metrics observed during training and validating. We won't have a solid way of judging the exact accuracy of the model on the test set. If we have a decently sized validation set though, and the data contained in it is a good representation of what the model will be exposed to in the test set and in the "real world," then that increases confidence in the model's ability to perform well on new, unseen data if the model indeed performs well on the validation data. Does this help clarify?

@aroonsubway2079 3 жыл бұрын

IMO，the test dataset should have labels so that we can at least have some accuracy numbers to look at in the end. The only difference btw validation dataset and test dataset is that, we still have chance to update model based on the validation results by tuning hyperparameters. However, test dataset only provide us a final accuracy number, even it is bad, we won't perform additional training.

@_WorldOrder 4 жыл бұрын

i hope that i'll get reply, my question is that do i have to dig deeper in machine learning concepts for starting deep learning or all these fundamentals are fair enough to start deep learning btw thankyou for providing us such valuable content for free

@deeplizard 4 жыл бұрын

Yes, you can start with this DL Fundamentals course without prior ML experience/knowledge. Check out the recommended deep learning roadmap on the homepage of deeplizard.com You can also see the prerequisites for each course there as well.

@_WorldOrder 4 жыл бұрын

@@deeplizard thnakyou so much, i'm falling in love with lizard for the first time xD

@DataDrivenDecision 3 жыл бұрын

I have a question? Why we can not use validation approach in normal machine learning? Why we only use it in deep learning problems to prevent overfitting?

@bonalareddy5339 3 жыл бұрын

I'm kinda confused about this, for a very large dataset - say (10 million records). In general in the production environment how will be the train test split would be done to evaluate how our model is working? -> I have heard in a few resources that it is okay to split the data into 98% for training , 1% validation (100,000 rows) and 1% testing (100,000 rows). The theory behind this is that 1% of the data is most probably representing the maximum variance in the data. -> And some say, we have to split the data more or less 70% for training, 15% for validation and 15% for testing. The theory behind this is that if we have a large data for validation, testing and if it is giving good accuracy on that, then we can say with "confidence" that it would work nearly the same in real time as well. If any of this is right or wrong, could you please explain me with a reason.

@boulaawny4420 6 жыл бұрын

Could you give a real world example of training set and validating set ?! kinda like i want to train if it's blue or red flower depending on its height or width ... and i use k-nearest neighbour so what validation set consists of ?

@deeplizard 6 жыл бұрын

Hey Andy, So, sticking with your example of flowers-- You would start out by gathering the data for red and blue flowers. This data would presumably be numerical data containing the height and width of the flowers, and each sample from your data set would be labeled with "blue" or "red." You would then split this data up into a training set and validation set. A common split is 80% training / 20% validation. You would then train your model on the data in the training set, and validate the model on the data in the validation set. Does this example make things more clear?

@Normalizing-polyamory 5 жыл бұрын

If the test set is unlabeled, how can you measure accuracy? How can you know that the model works?

@marwanelghitany8875 5 жыл бұрын

you actually got the labels of your test set, but don't get them through your model, so you wait until model make the prediction, then compare them with the labels which you held back at first, and calculate the accuracy based on how similar the are.

@mohamamdazhar6813 4 жыл бұрын

does this work for unsupervised and reinforcement learning?

@deeplizard 4 жыл бұрын

No, RL works differently. Check out our RL course: deeplizard.com/learn/playlist/PLZbbT5o_s2xoWNVdDudn51XM8lOuZ_Njv

@mohamamdazhar6813 4 жыл бұрын

@@deeplizard ty

@thegirlnextdoor2660 5 жыл бұрын

Explanation was really good ma'am but the white screen console that you showed could not be read. Please make those contents brighter and in big fonts.

@deeplizard 5 жыл бұрын

Thanks for the feedback, Sayantani. In later videos, I zoom in on the code, so it is much easier to read. Also, note that most videos have corresponding text-based blogs that you can read at deeplizard.com The blog for this video can be found at deeplizard.com/learn/video/Zi-0rlM4RDs

@neocephalon 4 жыл бұрын

You're like the 3blue1brown of deep learning! You deserve waaay more subs. Maybe if you include tensorflow tutorials in this format you could get a crap ton of more subs because there'll be others out there looking for explanations of how tensorflow works in intuitive ways, who aren't mathematically literate. Key here is to reduce the need for mathematical literacy and make the concepts more intuitive and easier to get into. If you were to introduce math literacy needed to explain these concepts, then you'd need to hope that the people who are looking to understand these concepts have figured out that by watching the likes of 3blue1brown (assuming that they've found him in the process of wanting to understand the math (hint: most people don't want to learn the math, they just want to understand the code)). So there you have it, a possible method for you to gain more subs :P

@deeplizard 4 жыл бұрын

Hehe thank you, Jennifer! :D

@nadaalay4528 6 жыл бұрын

Thank you for this video. In the results of the example, the validation accuracy is higher than the training accuracy. Is this considered a problem?

@deeplizard 6 жыл бұрын

Hey Nada - It's typically going to be up to the engineer of the network to determine what is considered acceptable regarding their results. In general, I would say that you typically want your training and validation accuracy to be as close as you can get them to each other. If the validation accuracy is considerably greater than the training accuracy, then you may want to take steps to decrease the difference between those metrics. If the model used in this video was one I planned to deploy to production, for example, then I would take steps to close this gap. This would be considered a problem of underfitting. I talk more about that here: kzbin.info/www/bejne/ZpmbnXSjarCca8k

@LeSandWhich 2 жыл бұрын

FIT does NOT have validation ? most of the time, people code looks like this: clf=svm.SVC.fit( X_train, y_train) the validation_set and validation_split are no where to be found, even sklearn doc doesn't mention it. What is going on, how come these model don't get overfitting without a validation set ?

@richarda1630 3 жыл бұрын

silly question: can you also pass in pandas df's or is it irrelevant? and numpy is enough?

@poojakamra1177 6 жыл бұрын

I have trained and validate data. Now how can i test an image in the model?

@deeplizard 6 жыл бұрын

Hey Pooja - Check out this video, and let me know if it answers your question. kzbin.info/www/bejne/mJe0c4OEed5oe68

@fatihaziane4443 5 жыл бұрын

Thanks

@tvrtkokotromanic9158 4 жыл бұрын

Universities are getting obsolete when you have KZbin. I mean I have siriously mearned more from KZbin on machine learnkng and C# coding than from professors at University. Thanks for this great explanation. 🙏

@arifurrahaman6493 6 жыл бұрын

It's indeed helpful and understandable. As I am at beginner level, I wonder if there any way to get the demo code you are using for making these videos. Thanks in advance.

@deeplizard 6 жыл бұрын

Thanks, Arifur! Download access to code files and notebooks are available as a perk for the deeplizard hivemind. Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind

@bobjoe275 4 жыл бұрын

There's an error in using validation_data in model.fit. The format should be a tuple of NumPy arrays, i.e. valid_set = (np.array([0.6,0.5]), np.array([1,1]))

@deeplizard 4 жыл бұрын

Yes, this is specified in blog for the episode below: deeplizard.com/learn/video/dzoh8cfnvnI

@montassarbendhifallah5253 4 жыл бұрын

Hello, Thank you for this playlist . It's awesome! My question is : In some cases, we don't specify a validation set. Why ? and when is not important to set a validation data ?

@nandinisarker6123 4 жыл бұрын

This is my question too. Hope someone answers.

@montassarbendhifallah5253 4 жыл бұрын

@@nandinisarker6123 Well, I found these 2 links: stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set machinelearningmastery.com/difference-test-validation-datasets/

@OpeLeke 2 жыл бұрын

validation set is used to tweak the hyperparameters of a model.

@aceespadas 6 жыл бұрын

Using Google Colab on this. I've set the same model and hyper-parameters. I've also used the same code to preprocess the data and the same params to train the model (batch_size, Adam's lr, validation_split, epochs) but I'm not getting the same metrics as you while training no matter how much I try. the validation accuracy plateaus around 0.75 and the val_loss starts at around 0.68 decreases then starts increasing around the 12th epoch to end around 0.66. This is bugging me and I can't figure it out. PS: I also tried with theano as a backend for Teras

@deeplizard 6 жыл бұрын

Hey Yassine - Are you using the same data to train your model? This data was created in the Keras series. If you did use the same data, then be sure that you caught the reference in the Keras validation set video to reverse the order of the for-loops that generates the data. Let me know.

@aceespadas 6 жыл бұрын

@@deeplizard Thank you for reaching back. Yes, I generated the data in the same fashion as you did in the preprocessing data video from the Keras series. I caught the for-loop reverse reference in that same series and after rectifying my code the val loss accuracy and loss behaved normally while fitting. But I'm not sure why the behaviour changed. As you explained, doing the validation split will take a percentage of the training set prior to fitting (I dunno if the validation set is generated after a shuffle or not in this case) and won't regenerate on each shuffle on each epoch. But why is switching the for-loops order mattered? you are still taking the 10 or 20% bottom of your data regardless of the for loop order and you end up with the same validation data on each epoch. Also, I used a sigmoid function in the output as you did in this series yet my prediction probabilities don't sum up to 1 as you depicted in the prediction video within the same playlist. Using a softmax function like in Keras API series works fine. It helps if you could clear up this confusion.

@deeplizard 5 жыл бұрын

The validation_split parameter takes the last x% of the data in the training set (10% in our example), and doesn't shuffle it. With the way I had the for-loops organized originally, the validation split would completely capture all of the data in the second for loop, which was the 5% of younger individuals who did experience side effects and the 5% of older individuals who did not experience side effects. Therefore, none of the data in the second for-loop would be captured in the training set. With the re-ordering of the for-loops, the training set is made up of the data that is now generated in both for-loops. Another (better) approach we could have taken is, after generating all the data with both for-loops (regardless of which order the loops are in), we could shuffle all the data completely, and then pass that shuffled data to our model. With shuffled data, the training set and validation sets would be a more accurate depiction of the true underlying data since there would be no real ordering to it. As long as your data in the real world is shuffled before you pass it to your model, you shouldn't experience this problem.

@hasnain-khan 4 жыл бұрын

If i have 1000 rows in dataset. Then how can select 800 rows for training and 200 for testing instead of select randomly in splitting?

@deeplizard 4 жыл бұрын

Keras can automatically split out a percentage of your training data for validation only. More here: deeplizard.com/learn/video/dzoh8cfnvnI

@balabodhiy730 6 жыл бұрын

All fine, but I came to know that the dependent variable are not included in the training set, how is that? Thank you

@deeplizard 6 жыл бұрын

Hey balabodhi - I'm not sure what you mean by "dependent variable." Can you please elaborate?

@balabodhiy730 6 жыл бұрын

For predictic we split a dataset into two sets, one is train set and another is test set. But when we have separate datasets for training and testing given, then do we include the dependent variable (response variable or the Y Variable) in testing dataset. Because in one of my simple Logistic reg analysis, they have given three datasets separately: training, validation and testing. In the testing dataset, I don't have the response variable ie., the Y variable. So this my question, can we test a dataset without the response variable Y.

@balabodhiy730 6 жыл бұрын

@deeplizard 6 жыл бұрын

I see. Yes, many times, we don't have the labels for the test data. This is completely fine. The labels for training and validation data are required, but labels for test data are not required.

@balabodhiy730 6 жыл бұрын

ok that's fine, but I couldn't understand what is labels

@greeshmanthmacherla2105 4 жыл бұрын

what happens if i give same set of images for both validation and training datasets?

@deeplizard 4 жыл бұрын

You will not be able to identify overfitting or see how your model is generalizing to data it wasn't trained on.

@greeshmanthmacherla2105 4 жыл бұрын

@@deeplizard i got an error :"`validation_split` is only supported for Tensors or NumPy " what should I do now?

@deeplizard 4 жыл бұрын

Either convert your data to a supported data type, or manually create a separate validation set. More details in this blog: deeplizard.com/learn/video/dzoh8cfnvnI

@joxa6119 2 жыл бұрын

Where to get the dataset?

@sathyakumarn7619 4 жыл бұрын

My dear Channel, I only wish for you to change the ominous music in the beginning! TY :-(

@deeplizard 4 жыл бұрын

Lol it has been changed in later episodes :D

@sathyakumarn7619 4 жыл бұрын

@@deeplizard Looking forward! Thanks again for the video

@Appletree-db2gh 3 жыл бұрын

why can't you use Test after each epoch of training, since no weight will be updated from it?

@peaceandlove5855 5 жыл бұрын

how to test accuracy when predicting non-binary output ? (as far as i know , they use ''confusion matrix'' when output are binary)

@deeplizard 5 жыл бұрын

You can also use a confusion matrix for non-binary output. I show an example of this towards the end of this video: kzbin.info/www/bejne/fH_UoWeQjpWqers

@GelsYT 5 жыл бұрын

Thanks, it made my mind clear you deserve a sub. But I have a question , what if we don't train? so it means like the accuracy will drop? like 0%? will give an error? or maybe like training is a must and what a stupid question I am asking hahahaha btw learning NLP here using python - nltk

@GelsYT 5 жыл бұрын

model? you mean the software or system right?

@deeplizard 5 жыл бұрын

If you don't train the model, then it will likely perform no better than chance for the given task. By "model," I mean the neural network.

@bisnar1307 5 жыл бұрын

You are the best :)

@denisutaji2094 3 жыл бұрын

what is the different about model.evaluate() and model.predict() model.predict() got the lower accuracy than model.evaluate()

@tusharvatsa5293 4 жыл бұрын

You are fast!

@alexiscruz5854 4 жыл бұрын

I love your videos

@Leon-pn6rb 4 жыл бұрын

1:00 - 1:55 You lost me there. First you said Validation is for HP tuning and then you say that it is Not -_- off to another video/article

@deeplizard 4 жыл бұрын

By validating the model against the validation set, we can choose to adjust our hyperparameters based on the validation metrics. The weights of the network, however, are not adjusted during training based on the validation set. (Note that weights are not hyperparameters.) The weights are only adjusted according to the training set. This is what is stated in the video. Hope it's clear now.

@appinventorappinventorpak9154 6 жыл бұрын

Please share the code as well

@deeplizard 6 жыл бұрын

Hey App Inventor- The code files for this series are available as a perk for the deeplizard hivemind at the following link: www.patreon.com/posts/code-for-deep-19266563 Check out the details regarding deeplizard perks and rewards at: deeplizard.com/hivemind

@appinventorappinventorpak9154 6 жыл бұрын

thanks alot, this is really excellent tutorial, explained so well in a simple manner

@appinventorappinventorpak9154 6 жыл бұрын

i request you to please provide a tutorial on 3d convolution algorithm, to process the medical image files

@deeplizard 6 жыл бұрын

Thank you, I'm glad you're enjoying the videos! I'll add 3D convolutions to my list of potential topics to cover in future videos. Thanks for the suggestion. In the mean time, I do have a video on CNNs in general below if you've not yet seen that one. kzbin.info/www/bejne/j4PLqZeMoMSmf9U

@marouaomri7807 4 жыл бұрын

I think you should slow down when you are explaining to let the information sink in :)

@deeplizard 4 жыл бұрын

I have in later videos. In the mean time, each video has a corresponding written blog on deeplizard.com that you can check out for a slower pace :)

@marouaomri7807 4 жыл бұрын

@@deeplizard Thank you so much the course helped me to understand better

@TP-gx8qs 5 жыл бұрын

Just talk more slowly. I had to put you at 0.75 speed and you sound like you are drunk.

@deeplizard 5 жыл бұрын

Lol The blogs are helpful for slower pac as well: deeplizard.com/learn/video/Zi-0rlM4RDs

@MrKrasi97 4 жыл бұрын

haha same issue here

@amiryavariabdi8962 3 жыл бұрын

Dear the artificial intelligence community I am pleased to introduce DIDA dataset, which is the largest handwritten digit dataset. I will be grateful, if you could help me to introduce this dataset to the community. Thanks

@literaryartist1 5 жыл бұрын

I'm lost. Are we really talking about weight training or something else?!

@radouaneaarbaoui7206 4 жыл бұрын

speaking very fast as we are computers to catch up with the speed.

@big-blade 5 жыл бұрын

why is the music so scary?

@blankslate6393 2 жыл бұрын

The role of validation set in adjusting weights is still unclear to me after listening that part 3 times. So not great explanation. Maybe you need to make a video specific to this.

@GQElvie 3 жыл бұрын

not helpful at all. why cant anybody just show an EXAMPLE so that we can really wrap our head around this. I have a vague idea of validation. I take it the validation just "updates" because of the new information?? if that is it, then why are there 10 different definitions. what is an example of the validation demonstrating underfitting or overfitting?

@styloline 4 жыл бұрын

wayway

@styloline 4 жыл бұрын

wayway

@MrUsidd 5 жыл бұрын

Use 0.5x speed. Thank me later.

@AnkitSingh-wq2rk 5 жыл бұрын

???? 0.5 ????

@nmcfbrethren1407 3 жыл бұрын

I think you could redo this video and speak slowly and calmly.

@rankzkate 4 жыл бұрын

Too fast for me.I kept on rewinding

@deeplizard 4 жыл бұрын

You can use the corresponding blogs for every video on deeplizard.com to move at a slower pace as well.

@deeplizard 4 жыл бұрын

deeplizard.com/learn/video/Zi-0rlM4RDs

@moyakatewriter 4 жыл бұрын

Maybe talk slower. It's hard to understand people when they're racing.

@OscarRangelMX 6 жыл бұрын

man!!! you speak soooo fast it is hard to keep up with the video and what you are saying, great content, but you need to slow down......

@ShivamPanchbhai 3 жыл бұрын

speak slowly

@omidasadi2264 5 жыл бұрын

too fast and too bad quality about telling concept

@534A53 3 жыл бұрын

This is wrong, the test set must also be labelled. Otherwise you cannot evaluate the model at the end. The video should be corrected because it is teaching incorrect information to people.