How is Beam Search Really Implemented?

  Рет қаралды 10,691

Efficient NLP

Efficient NLP

Күн бұрын

Beam search. You've probably heard of it, but there are surprising tricks to make it work in practice.
In this video, I walk through how beam search is implemented in the Huggingface transformers library. Find out what clever data structures are used to enhance the performance of this decoding technique!
0:00 - Introduction
1:11 - Review of Beam Search
2:56 - Hugging Face Code Walkthrough
Resources about beam search:
huggingface.co/blog/how-to-ge...
Huggingface source code:
github.com/huggingface/transf...
#BeamSearch #nlp #huggingface #ai #machinelearning #deeplearning

Пікірлер: 13
@kevon217
@kevon217 11 ай бұрын
Very well explained!
@amritpandey6964
@amritpandey6964 10 ай бұрын
nicely explained!
@kushagrabhushan
@kushagrabhushan 10 ай бұрын
hey, great video! I just wanted to ask what you are using as a debugger to get the intermediate values of the variables? looks very interesting...
@EfficientNLP
@EfficientNLP 10 ай бұрын
I used PyCharm for this video, but most modern IDEs should have a similar feature.
@kushagrabhushan
@kushagrabhushan 10 ай бұрын
@@EfficientNLP thank you so much!
@feixyzliu5432
@feixyzliu5432 4 ай бұрын
seems no kv cache is used in the implementation. How to make beam search compatible with kv cache and make it more efficient?
@EfficientNLP
@EfficientNLP 4 ай бұрын
I didn't mention it in this video, but the KV cache is supported in the Hugging Face implementation (and by default is turned on) -- it is the use_cache parameter.
@feixyzliu5432
@feixyzliu5432 4 ай бұрын
I just read the Hugging Face transformers implementation. Sure, it does support kv cache, however beam search in transformers is implemented by simply expanding batch size. I'm sure this is not that efficient, especially for memory, since nothing is reused here, even the kv cache for the prompts in prefilling phase is not reused. Do you know any implementation that is more mature or optimized? Thanks a lot! @@EfficientNLP
@arjunkoneru5461
@arjunkoneru5461 4 ай бұрын
You can pass your custom past_key_values by doing a forward pass once and load it in generate @@feixyzliu5432
@ameynaik2743
@ameynaik2743 10 ай бұрын
kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 2. kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 3. kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 6? Why do we take top 6 (num_beams * 2) as mentioned here kzbin.info/www/bejne/qoDLiKN8apKSsJY ? Also kzbin.info/www/bejne/qoDLiKN8apKSsJY with boy as input, 'and' and 'who' had highest prob (you chose top 2) but with 'dog' as input only 'who' i.e. top 1 was chosen? are you picking top 3 across outputs with inputs 'boy' 'dog' and 'woman'?
@EfficientNLP
@EfficientNLP 10 ай бұрын
In the code example, the beam size is 3, but the batch size is 2. That's why it appears we have 6 sequences at a time, and this illustrates how beam search is combined with batching. About your question about taking the top 3: We are taking the top 3 beams overall, and they may correspond to any beams from the previous iteration (it's not necessarily a 1-to-1 correspondence). So we might use 2 candidates from the beam ending with "boy", 1 from the beam ending in "dog", and 0 from the beam ending in "woman". Hope this clarifies things!
@piotr780
@piotr780 4 ай бұрын
what IDE is this ?
@EfficientNLP
@EfficientNLP 4 ай бұрын
This is PyCharm, but VS Code has similar debugging functionality.
Speculative Decoding: When Two LLMs are Faster than One
12:46
Efficient NLP
Рет қаралды 10 М.
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
19:46
ОБЯЗАТЕЛЬНО СОВЕРШАЙТЕ ДОБРО!❤❤❤
00:45
No empty
00:35
Mamasoboliha
Рет қаралды 10 МЛН
C5W3L03 Beam Search
11:55
DeepLearningAI
Рет қаралды 84 М.
Rotary Positional Embeddings: Combining Absolute and Relative
11:17
Efficient NLP
Рет қаралды 28 М.
Hill Climbing Search
5:20
Zareen Zulkarnain
Рет қаралды 15 М.
Fine-tuning Whisper to learn my Chinese dialect (Teochew)
28:10
Efficient NLP
Рет қаралды 5 М.
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 33 М.
AI Knowing My Entire Codebase Resulted in a 20x Productivity Increase
9:33
What are AI Agents?
12:29
IBM Technology
Рет қаралды 122 М.
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 265 М.
11  Beam Search Decoding
9:34
Minh Nguyễn
Рет қаралды 7 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57