How is Beam Search Really Implemented?

Рет қаралды 10,691

Күн бұрын

Beam search. You've probably heard of it, but there are surprising tricks to make it work in practice.
In this video, I walk through how beam search is implemented in the Huggingface transformers library. Find out what clever data structures are used to enhance the performance of this decoding technique!
0:00 - Introduction
1:11 - Review of Beam Search
2:56 - Hugging Face Code Walkthrough
Resources about beam search:
huggingface.co/blog/how-to-ge...
Huggingface source code:
github.com/huggingface/transf...
#BeamSearch #nlp #huggingface #ai #machinelearning #deeplearning

Пікірлер: 13

@kevon217 11 ай бұрын

Very well explained!

@amritpandey6964 10 ай бұрын

nicely explained!

@kushagrabhushan 10 ай бұрын

hey, great video! I just wanted to ask what you are using as a debugger to get the intermediate values of the variables? looks very interesting...

@EfficientNLP 10 ай бұрын

I used PyCharm for this video, but most modern IDEs should have a similar feature.

@kushagrabhushan 10 ай бұрын

@@EfficientNLP thank you so much!

@feixyzliu5432 4 ай бұрын

seems no kv cache is used in the implementation. How to make beam search compatible with kv cache and make it more efficient?

@EfficientNLP 4 ай бұрын

I didn't mention it in this video, but the KV cache is supported in the Hugging Face implementation (and by default is turned on) -- it is the use_cache parameter.

@feixyzliu5432 4 ай бұрын

I just read the Hugging Face transformers implementation. Sure, it does support kv cache, however beam search in transformers is implemented by simply expanding batch size. I'm sure this is not that efficient, especially for memory, since nothing is reused here, even the kv cache for the prompts in prefilling phase is not reused. Do you know any implementation that is more mature or optimized? Thanks a lot! @@EfficientNLP

@arjunkoneru5461 4 ай бұрын

You can pass your custom past_key_values by doing a forward pass once and load it in generate @@feixyzliu5432

@ameynaik2743 10 ай бұрын

kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 2. kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 3. kzbin.info/www/bejne/qoDLiKN8apKSsJY - here beam len is 6? Why do we take top 6 (num_beams * 2) as mentioned here kzbin.info/www/bejne/qoDLiKN8apKSsJY ? Also kzbin.info/www/bejne/qoDLiKN8apKSsJY with boy as input, 'and' and 'who' had highest prob (you chose top 2) but with 'dog' as input only 'who' i.e. top 1 was chosen? are you picking top 3 across outputs with inputs 'boy' 'dog' and 'woman'?

@EfficientNLP 10 ай бұрын

In the code example, the beam size is 3, but the batch size is 2. That's why it appears we have 6 sequences at a time, and this illustrates how beam search is combined with batching. About your question about taking the top 3: We are taking the top 3 beams overall, and they may correspond to any beams from the previous iteration (it's not necessarily a 1-to-1 correspondence). So we might use 2 candidates from the beam ending with "boy", 1 from the beam ending in "dog", and 0 from the beam ending in "woman". Hope this clarifies things!