why you can directly change x_t-1 to x_t+1 in 29:21.
@gabrielmongaras2 күн бұрын
It depends on if you do a forward step or an inversion step in the opposite direction using the formulas above. Both come from x_t though.
@ml-ok3xq2 ай бұрын
congrats on writing a paper! i notice that another recent paper from NVIDIA uses a unit vector for attention (nGPT) where the dot product is naturally equal to cosine as the lengths are one. are these two works related to each other in any way?
@gabrielmongaras2 ай бұрын
Thanks!! I only read through the nGPT paper briefly, but I think nGPT was trying to make softmax attention/transformers more expressive and efficient by changing a few things. They do normalize before they apply the softmax function, making the logits a cosine similarity between -1 and 1. However they keep the softmax operation which forces the model to stay quadratic in terms of complexity. The paper I worked on removed the softmax function which allowed the attention mechanism to be changed into an RNN which is linear in complexity.
@陈兆伟-s5w4 ай бұрын
How is the equality in DDPM established in 17:49?
@gabrielmongaras4 ай бұрын
Looks like I forgot to write out the square root over the first term. As for the inner term that got turned into a fraction, I just multiplied sqrt{1-a_t} by the fraction sqrt{1-a_t}/sqrt{1-a_t}.