Nicely done and very helpful! Thank you!! FYI, the stress is on the first syllable of "INference", not the second ("inFERence").
@yanaitalkАй бұрын
Copy that! Thank you😊
@johndong47542 ай бұрын
Ive been learning about LLMs over the past few months, but i havent gone into too much depth. Your videos seem very detailed and technical. Which one(s) would you recommend starting off with?
@yanaitalk2 ай бұрын
There are excellent courses from DeepLearning.ai on Coursera. To go even deeper, I recommend to directly read the technical papers which gives you more depth of understanding.
@HeywardLiuАй бұрын
1. Roofline model 2. Transformer arch. > bottleneck of attention > flash attention 3. LLM Inference can be divided into: prefilling-stage (compute-bound) and decoding-stage (memory-bound) 4. LLM serving: paged attention, radix attention If you want to optimize the inference performance, this review paper is awesome: LLM Inference Unveiled: Survey and Roofline Model Insights