王老师您好。one - to - one model 是输入和输出长度都是固定的。RNN可以用来做many- to -one,但在这个视频中RNN的输入长度仍是固定的,所以我想问的是 这里用的训练样本(movie review analysis)可不可以是没有经过align sequences的,即输入长度不固定(就有电影评论的本身长度),输出固定为1(好评或者差评)。期待老师的回复,谢谢您。
@ShusenWang3 жыл бұрын
训练的时候要align,长度一样。inference的时候长度可以任意。
@leejack52092 жыл бұрын
老师请问,y是0或者1吗还是,没有看到怎么标注y的。对应的pos和neg
@xinyuanwang38053 жыл бұрын
支持支持!!!!!!!!!
@xinliu47852 жыл бұрын
讲得太好了!
@yuefang10304 жыл бұрын
王老师,你说的根据cross validation找到合适的维度值如何理解,我没太搞明白,多谢
@ShusenWang4 жыл бұрын
RNN做分类会错误率。选择超参数(包括x和h的维度),让错误率最低。
@gaokaizhang Жыл бұрын
讲的可太好了
@DED_Search3 жыл бұрын
6:17 这里有些糊涂。我理解例子里 当A的值大于1的时候 A100会非常大 做 back propagation 的时候会梯度爆炸 反之小于1的话会梯度消失 所以要对 h 用一下激活函数tahn?但是我感觉这个只对梯度爆炸有作用,因为tanh也没办法对0附近的值起到什么作用。。。 第二个问题 我发现做rnn的时候 loss降低的过程会规律的产生cliff 跟这个例子相关么?
0:22 from nlp perspective, RNN is not comarapble to transformer model, should the training dataset is large enough, however RNN is useful for small dataset. 00:33 limitations : a. FC nets and conv nests are 1-to-1 model ( where you process a parapgra as a whole and output is one ) b. on contrarry to human behavior where accumulate the text but not aggregated paragraph as a whole c. fixed-size output 2:31 RNN introduction x_t word embedding A parameter of RNN (NOTE! shape of A = shape(h) * (shape(h)+shape(x)) + shape(bias) , there is ONLY ONE such parameter matrix A, NO MATTER HOW long the sequence is.) h_t state of all previous words 4:56 simple RNN tanh : used for normalization to enforce A elements in (0,1) 7:22 simple RNN parameter dimensions for h_t = tanh(A* (h_{t-1}+x_t)^T ) , rows of A = shape of h , cols of A = shapes of (h) + shapes of (x) 8:08 structure of RNN for case study word embedding : map a word to a embedding vector(x) A : input is the word embeeding, output is status h_i 10:00 explain RNN parameter(keras) embedding_dum=32 shape of word vector (x) = 32 word_num = 500 means we cut off at most 500 words in each movie review. state_dim =32 means the shape of status h = 32 return _sequenes=False means RNN only output the very last status h_t and disregard all previous status from h_0 to h_{t-1} 12:01 how to calculate rnn parameter 2080 = 32*(32+32) + 32 = shape(h) * (shape(h)+shape(x)) + shape(bias) 13:48 to return all previous status, the returned output is a vector h vec([h_1, ... h_t]) , then flattern to sigmoid with this vector h to get final output. 16:17 simple RNN disadvanatage : good at short-term dependence, BAD at long-term dependence: "h_100" is almost irrelevant to x1 \frac{\partial h_{100}}{\partial x_{1}} is near zero, this means change x1 almost will NOT change h_100. (LSTM is longer than simple RNN thought LSTM still has its own issue.)