🎯 Key Takeaways for quick navigation: 00:00 🔄 *미니 배치 경사 하강법의 기본 개념 설명* - 미니 배치 경사 하강법을 이용해 훈련 세트를 일부만 처리하면서도 경사 하강법 단계를 진행할 수 있음, - 배치 경사 하강법과 미니 배치 경사 하강법의 차이점, - 비용 함수 J가 반복마다 감소해야 하며, 미니 배치에서는 비용 함수의 감소가 불규칙할 수 있음을 설명. 02:12 📊 *미니 배치 크기의 중요성과 선택 방법* - 미니 배치 크기를 결정하는 요소와 그 극단적인 경우(전체 훈련 세트 크기, 1)의 비교, - 미니 배치 경사 하강법과 확률적 경사 하강법의 차이점과 각각의 장단점, - 최적의 미니 배치 크기는 전체 훈련 세트 크기와 1 사이의 어딘가에 위치. 05:14 💡 *미니 배치 경사 하강법의 실제 적용과 최적화* - 미니 배치 경사 하강법의 실제 적용에서 고려해야 할 사항과 벡터화의 이점, - 학습률 조정 및 최적 미니 배치 크기 탐색을 통한 경사 하강법 최적화 방법, - 미니 배치 크기를 결정할 때 메모리 한계 고려와 실제 적용 예시. 08:18 🛠️ *미니 배치 크기 선택 가이드라인과 하이퍼파라미터 탐색* - 소규모 훈련 세트에는 배치 경사 하강법 권장, 큰 훈련 세트의 경우 64~512 사이의 미니 배치 크기 사용 권장, - 미니 배치 크기를 2의 거듭제곱으로 설정하는 이유와 실제 메모리에 맞는 크기 선택의 중요성, - 미니 배치 크기와 같은 하이퍼파라미터를 실험을 통해 최적화하는 방법과 전략. Made with HARPA AI
@adsuabeakufea11 ай бұрын
really great explanations here
@klausdupont63357 жыл бұрын
I suppose there is an error in the title: the "Dexcent" should be "Descent".
@IgorAherne7 жыл бұрын
:D
@JoseRomero-wp4ij5 жыл бұрын
Wha are you? The grammar-nazi of AI?
@jorjiang15 жыл бұрын
@@JoseRomero-wp4ij has nothing to do with grammar.
@sandipansarkar92113 жыл бұрын
great explanation.Need to watch again
@khushnoodabbas70845 жыл бұрын
In case of multiclass classification, do we need to take care that every mini batch should have the examples of all the classes?
@naveenmirada24435 жыл бұрын
Did you get an answer?
@khushnoodabbas70845 жыл бұрын
Naveen Mirada no not yet..
@questforprogramming5 жыл бұрын
@@khushnoodabbas7084 I have searched a lot for multi class gradient descent regression over KZbin but in vain. All are doing videos with single independent variable..!!!
@khushnoodabbas70845 жыл бұрын
@@questforprogramming . I have found the solution for this problem. you can do it in two ways 1) you can implement the same binary class classification model for multiclass classification problem using on vs all method see this lecture and related for the explanation (kzbin.info/www/bejne/kKfEdn98q5p8pq8). 2) The second way to implement is obviously using softmax which is faster than one vs all method and standard also. The only problem you might face when implementing softmax is calculating derivative at the final layer when doing backpropagation so after long research, I have found the solution here by this guy (wordpress@example.com) unfortunately unable to find the blog again. you can try.
@khushnoodabbas70845 жыл бұрын
@@questforprogramming As for as the mini-batch problem I have implemented by considering every example in each mini batch...
@rahulagrawal81795 жыл бұрын
but what if the test size is not a factor of 2. What wil be the sie of mini batch GD?????
@kkori_tuikimАй бұрын
You're my god!!!!!!!!
@fadoobaba10 ай бұрын
Last tip is really the only reason to use mini batch
@yongjiewang96863 жыл бұрын
This is called: Nice!
@ahmedb2559 Жыл бұрын
thank you !
@haneulkim49022 жыл бұрын
Amazing! When using batch GD (mini-batch size = m) it converges unlike mini-batch < m ?
@rafipatel50202 жыл бұрын
Nope, it will converge as a regular GD. Since size=m , it is no different than a GD
@muratcan__226 жыл бұрын
nice explanation
@bluefanta7668 Жыл бұрын
In the case of mini batch GD, one epoch equals to the one the mini batch or the whole dataset?
@huaizhiwu Жыл бұрын
One epoch means going through the whole dataset once in both batch GD and mini-batch GD. The difference is just that there's only one gradient update per epoch in batch GD, but many updates per epoch in mini-batch GD (one update per mini-batch).
@Rocklee46v4 жыл бұрын
In Batch GD, we pass entire dataset to the neural net and do the forward propagation, calculate the cost function, based on the cost function, we do the back propagation and we would perform the same operation for few more epochs. Can anyone help me understand with order of operation in mini batch gradient decent, and comparing it with the batch gradient decent will be high appreciated.
@AvinashSingh-bk8kg3 жыл бұрын
Batch Gradient descent: Suppose u have 10K records. Now, all 10K records will be considered at once in the forward propogation. Then we perform backward propogation on the same 10K records. This takes time as we are dealing with large number of data. We saw in this case gradient descent will be slow. As each step towards minima will be taken once entire dataset is dealt with. On the other hand in minibatch we split entire 10K records into small batches. Suppose we split it into batches of 1k. So we will have 10 such batches. Now we will pick one batch of 1k records. Do the forward propogation with the first batch of 1k records. Again to the backward propogation on this batch and will do the gradient descent step. So we saw we did not wait for the model to go through all the 10k records.Instead we do the gradient descent for every batch. Now after the gradient descent on this batch, update the weight and feed the next batch and repeat.
@banipreetsinghraheja85296 жыл бұрын
So, we can consider that in mini batch and stochastic, we keep epochs low, or else, the computational cost would be same as Batch Gradient Descent, right?
@4abhishekagarwal6 жыл бұрын
I think it's more about the computational cost arising due to keeping a large vector (n x m) together (in your RAM) and passing it through the network iteratively. Batch gradient descent suffers from this sort of computational complexity and takes a long time per iteration. The number of gradient descent steps you take to arrive at the minima should be fewer while implementing batch gradient descent than in the stochastic/mini-batch case, however. If you don't take enough epochs in a mini-batch/stochastic, you might not zero down on the local minima.
@nischalsimha99956 жыл бұрын
in mini batch gradient descent , after completing all the mini batches , do we redo the whole process again and again till we converge as we do in batch GD or is it just one time?
@Ditoekacahya6 жыл бұрын
@@nischalsimha9995 generally yes, you need to loop through several epochs (one complete round of mini-batches each) until you reach your specified cost function value
@nischalsimha99956 жыл бұрын
@2:03 , "# iterations" is same as number of epochs right?
@ManishKumar-rs8tw6 жыл бұрын
#iterations -> to be done until Gradient Descent is converged. # epochs -> One Iteration through all Mini Batches.
@nischalsimha99956 жыл бұрын
@@ManishKumar-rs8tw But epochs in case of batch gradient descent is one pass through the whole training set right?
@ManishKumar-rs8tw6 жыл бұрын
@Rahul Ks, Please ignore my earlier message. I edited the post.