Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Рет қаралды 66,776

Күн бұрын

Пікірлер: 35

@성이름-y9y3q 9 ай бұрын

🎯 Key Takeaways for quick navigation: 00:00 🔄 *미니 배치 경사 하강법의 기본 개념 설명* - 미니 배치 경사 하강법을 이용해 훈련 세트를 일부만 처리하면서도 경사 하강법 단계를 진행할 수 있음, - 배치 경사 하강법과 미니 배치 경사 하강법의 차이점, - 비용 함수 J가 반복마다 감소해야 하며, 미니 배치에서는 비용 함수의 감소가 불규칙할 수 있음을 설명. 02:12 📊 *미니 배치 크기의 중요성과 선택 방법* - 미니 배치 크기를 결정하는 요소와 그 극단적인 경우(전체 훈련 세트 크기, 1)의 비교, - 미니 배치 경사 하강법과 확률적 경사 하강법의 차이점과 각각의 장단점, - 최적의 미니 배치 크기는 전체 훈련 세트 크기와 1 사이의 어딘가에 위치. 05:14 💡 *미니 배치 경사 하강법의 실제 적용과 최적화* - 미니 배치 경사 하강법의 실제 적용에서 고려해야 할 사항과 벡터화의 이점, - 학습률 조정 및 최적 미니 배치 크기 탐색을 통한 경사 하강법 최적화 방법, - 미니 배치 크기를 결정할 때 메모리 한계 고려와 실제 적용 예시. 08:18 🛠️ *미니 배치 크기 선택 가이드라인과 하이퍼파라미터 탐색* - 소규모 훈련 세트에는 배치 경사 하강법 권장, 큰 훈련 세트의 경우 64~512 사이의 미니 배치 크기 사용 권장, - 미니 배치 크기를 2의 거듭제곱으로 설정하는 이유와 실제 메모리에 맞는 크기 선택의 중요성, - 미니 배치 크기와 같은 하이퍼파라미터를 실험을 통해 최적화하는 방법과 전략. Made with HARPA AI

@adsuabeakufea 11 ай бұрын

really great explanations here

@klausdupont6335 7 жыл бұрын

I suppose there is an error in the title: the "Dexcent" should be "Descent".

@IgorAherne 7 жыл бұрын

@JoseRomero-wp4ij 5 жыл бұрын

Wha are you? The grammar-nazi of AI?

@jorjiang1 5 жыл бұрын

@@JoseRomero-wp4ij has nothing to do with grammar.

@sandipansarkar9211 3 жыл бұрын

great explanation.Need to watch again

@khushnoodabbas7084 5 жыл бұрын

In case of multiclass classification, do we need to take care that every mini batch should have the examples of all the classes?

@naveenmirada2443 5 жыл бұрын

Did you get an answer?

@khushnoodabbas7084 5 жыл бұрын

Naveen Mirada no not yet..

@questforprogramming 5 жыл бұрын

@@khushnoodabbas7084 I have searched a lot for multi class gradient descent regression over KZbin but in vain. All are doing videos with single independent variable..!!!

@khushnoodabbas7084 5 жыл бұрын

@@questforprogramming . I have found the solution for this problem. you can do it in two ways 1) you can implement the same binary class classification model for multiclass classification problem using on vs all method see this lecture and related for the explanation (kzbin.info/www/bejne/kKfEdn98q5p8pq8). 2) The second way to implement is obviously using softmax which is faster than one vs all method and standard also. The only problem you might face when implementing softmax is calculating derivative at the final layer when doing backpropagation so after long research, I have found the solution here by this guy (wordpress@example.com) unfortunately unable to find the blog again. you can try.

@khushnoodabbas7084 5 жыл бұрын

@@questforprogramming As for as the mini-batch problem I have implemented by considering every example in each mini batch...

@rahulagrawal8179 5 жыл бұрын

but what if the test size is not a factor of 2. What wil be the sie of mini batch GD?????

@kkori_tuikim Ай бұрын

You're my god!!!!!!!!

@fadoobaba 10 ай бұрын

Last tip is really the only reason to use mini batch

@yongjiewang9686 3 жыл бұрын

This is called: Nice!

@ahmedb2559 Жыл бұрын

thank you !

@haneulkim4902 2 жыл бұрын

Amazing! When using batch GD (mini-batch size = m) it converges unlike mini-batch < m ?

@rafipatel5020 2 жыл бұрын

Nope, it will converge as a regular GD. Since size=m , it is no different than a GD

@muratcan__22 6 жыл бұрын

nice explanation

@bluefanta7668 Жыл бұрын

In the case of mini batch GD, one epoch equals to the one the mini batch or the whole dataset?

@huaizhiwu Жыл бұрын

One epoch means going through the whole dataset once in both batch GD and mini-batch GD. The difference is just that there's only one gradient update per epoch in batch GD, but many updates per epoch in mini-batch GD (one update per mini-batch).

@Rocklee46v 4 жыл бұрын

In Batch GD, we pass entire dataset to the neural net and do the forward propagation, calculate the cost function, based on the cost function, we do the back propagation and we would perform the same operation for few more epochs. Can anyone help me understand with order of operation in mini batch gradient decent, and comparing it with the batch gradient decent will be high appreciated.

@AvinashSingh-bk8kg 3 жыл бұрын

Batch Gradient descent: Suppose u have 10K records. Now, all 10K records will be considered at once in the forward propogation. Then we perform backward propogation on the same 10K records. This takes time as we are dealing with large number of data. We saw in this case gradient descent will be slow. As each step towards minima will be taken once entire dataset is dealt with. On the other hand in minibatch we split entire 10K records into small batches. Suppose we split it into batches of 1k. So we will have 10 such batches. Now we will pick one batch of 1k records. Do the forward propogation with the first batch of 1k records. Again to the backward propogation on this batch and will do the gradient descent step. So we saw we did not wait for the model to go through all the 10k records.Instead we do the gradient descent for every batch. Now after the gradient descent on this batch, update the weight and feed the next batch and repeat.

@banipreetsinghraheja8529 6 жыл бұрын

So, we can consider that in mini batch and stochastic, we keep epochs low, or else, the computational cost would be same as Batch Gradient Descent, right?

@4abhishekagarwal 6 жыл бұрын

I think it's more about the computational cost arising due to keeping a large vector (n x m) together (in your RAM) and passing it through the network iteratively. Batch gradient descent suffers from this sort of computational complexity and takes a long time per iteration. The number of gradient descent steps you take to arrive at the minima should be fewer while implementing batch gradient descent than in the stochastic/mini-batch case, however. If you don't take enough epochs in a mini-batch/stochastic, you might not zero down on the local minima.

@nischalsimha9995 6 жыл бұрын

in mini batch gradient descent , after completing all the mini batches , do we redo the whole process again and again till we converge as we do in batch GD or is it just one time?

@Ditoekacahya 6 жыл бұрын

@@nischalsimha9995 generally yes, you need to loop through several epochs (one complete round of mini-batches each) until you reach your specified cost function value