What will be the dimension of WL? Let the last layer weights matrix WL for convenience be called just W and the output vector of the last hidden layer be h = [h1, h2, ... , hn]. The output should be a (k*1) vector. Let the output be [L1, L2, ... , Lk]. Then if we don't include the bias for the last layer yet, the first element of the output layer should be: L1 = W11*h1 + W12*h2 + - - - - - + W1n*hn and the last element of the output layer shoudl be Lk = Wk1*h1 + Wk2*h2 + - - - - - - - + Wkn*hn. This makes the WL matrix of dimension k*n. And therefore when a matrix of dimension k*n multiplies a matrix of n*1, we get the output vector of dimension k*1. Therefore, instead of grad(WL), shouldn't it be grad(transpose of WL)?
@anuragdathatreya15985 жыл бұрын
in that "nasty" matrix, why is the 3rd column from the right repeated?