Thanks for explaining the length normalization of beam search! I was already wondering at the end of your last video what would happen if in some branches you predict an EOS token.
@vinitabaniwal6853 жыл бұрын
Thank you so much for this explanation!!
@AbhishekKumar-wf6io Жыл бұрын
Voice of the century :D
@sandipansarkar92113 жыл бұрын
nice explanation
@trexmidnite3 жыл бұрын
Sexiest voice in AI
@luck39496 жыл бұрын
I guess beam width can be made dynamic, for example, if NN tells that with p = 1 next letter is Z, than we can safely take width=1, and if on some step NN has no idea what should be next, than it's better to use bigger width. Right?
@zhifengyang18506 жыл бұрын
Even in the extreme case that one word's p is equal to 1, you still need to use fixed beam width. Suppose beam width is 3, RNN outputs A, B, C at time step 1, with prob 0.1, 0.2, 0.3 respectively. When we choose A as the input of the time step 2, we get the output word Z with prob 1, and we will get other outputs when we choose other words. But after RNN outputs in time step 2, you still need to compare all outputs from the 3 words A, B and C.
@luck39496 жыл бұрын
Zhifeng Yang thank you, now I see it. I did a little googling, and I found that there actually are some papers about dynamic width or dynamic puring, and it improves speed of the search by approximately 10%, with same quality.