Subword Tokenization: Byte Pair Encoding

  Рет қаралды 19,447

Abhishek Thakur

Abhishek Thakur

Күн бұрын

Пікірлер: 34
@maryma9971
@maryma9971 Жыл бұрын
This is amazing! Thanks for your clear explanation on BPE, really save my day. It is great that you also mention how to use BPE in other languages. Thank you!
@abdulkadirguner1282
@abdulkadirguner1282 Жыл бұрын
for bytepair and any encoding that includes chracter level encoded, it is really hard to get because, well, the chracters are already in the vocab
@mass13982
@mass13982 4 жыл бұрын
Best explanation of BPE
@tong-minglim1800
@tong-minglim1800 Жыл бұрын
Excellent explanation 👌
@soumyasarkar4100
@soumyasarkar4100 3 жыл бұрын
very lucid presentation !
@jjjery932
@jjjery932 3 жыл бұрын
thanks, buddy for this informative interpretation of BPE and BBPE
@陳翰儒-d5m
@陳翰儒-d5m 3 жыл бұрын
Man, you save me a lot, thank you very much.
@rajputjay9856
@rajputjay9856 4 жыл бұрын
Oh damn finally the next part is here , thanks sir ❤️
@dhamija80
@dhamija80 4 жыл бұрын
This was very informative, thanks so much for sharing this knowledge!!
@JoukoSalonen
@JoukoSalonen 3 жыл бұрын
very good, hands on and practical intro! thnak you, learned a lot!
@mass13982
@mass13982 3 жыл бұрын
Amazing content. Love your clear explanation and review of the original papers. Thank you Abhishek
@ramchalamkr1
@ramchalamkr1 3 жыл бұрын
Very very informative and explains the concept beautifully. Keep up with such content :)
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
Thank you! Will do!
@kochasaito4628
@kochasaito4628 4 жыл бұрын
Best interpretation of BPE, better than wiki's, I think.
@fingerstyledojo
@fingerstyledojo Жыл бұрын
Very good explanation, thank you
@ashishbhatnagar9590
@ashishbhatnagar9590 4 жыл бұрын
Excellent video Sir. Thanks a lot.
@harshitsethi9698
@harshitsethi9698 4 жыл бұрын
Thanks for the easy explanation!
@viveksalunkhe7293
@viveksalunkhe7293 3 жыл бұрын
Informative content. Sir can you please continue this playlist and if possible can you pick any kaggle based nlp challenge and show implementation of each and every step on approaching the problem.
@shirishbajpai9486
@shirishbajpai9486 2 жыл бұрын
Hi Abhishek! Thanks for the video.. One question that always keeps bugging me is misspelled words.. How do we handle them? Are they taken care of automatically in case of complex DL models where contextual information is used?
@MakerBen
@MakerBen 4 жыл бұрын
Super helpful!!
@Khushpich
@Khushpich 4 жыл бұрын
Thanks for the video. As a suggestion, I think your videos can generally be condensed to shorter lengths. This one could probably fit in 5-10 mins. Hope you don't take this the wrong, as I really appreciate the work you put in the channel, and hope you can grow it to larger audiences.
@ilyasaroui7745
@ilyasaroui7745 4 жыл бұрын
Really good video from your part Abhishek. What are you using to record the video like this ? and what are you using for writing ? It for me to give a presentation, Thank you
@abhishekkrthakur
@abhishekkrthakur 4 жыл бұрын
thanks. OBS and ipad.
@jayeetaputatunda
@jayeetaputatunda 4 жыл бұрын
Thanks for the great summary! Seems like byte-pair encoding could be used as a tokenization techniqye to build a word2vec model since it doesn't handle OOV words well? Does that make sense?
@keshavkasat9465
@keshavkasat9465 3 жыл бұрын
When is the next video coming? We're waiting!
@abhishekkrthakur
@abhishekkrthakur 3 жыл бұрын
soon :) sorry, got busy with loads of stuff :(
@keshavkasat9465
@keshavkasat9465 3 жыл бұрын
@@abhishekkrthakur Cool! Excited for the next one!
@gregarityNow
@gregarityNow 2 жыл бұрын
Great video thanks! and although it is basically irrelevant to the quality of your video, Schrank has a c ;)
@rounhi
@rounhi 4 жыл бұрын
Thanks! Is It possible to see a video on wordpiece tokenization please
@abhishekkrthakur
@abhishekkrthakur 4 жыл бұрын
Did you see this video fully?
@rounhi
@rounhi 4 жыл бұрын
Yes of course, I mean a specific vidéo on wordpiece tokenization with some examples. Thank you
@abhishekkrthakur
@abhishekkrthakur 4 жыл бұрын
@@rounhi okay. ill see. but there is not much difference.
@VjayVenugopal
@VjayVenugopal 4 жыл бұрын
Could you make videos on lstm(RNN)🙏🏻.. may be a simple name generator using character level lstm
@hakunamatata-qu7ft
@hakunamatata-qu7ft 4 жыл бұрын
Awsome
1 5 Byte Pair Encoding
7:38
From Languages to Information
Рет қаралды 33 М.
What is stemming and lemmatization?
13:17
Abhishek Thakur
Рет қаралды 14 М.
Правильный подход к детям
00:18
Beatrise
Рет қаралды 11 МЛН
黑天使被操控了#short #angel #clown
00:40
Super Beauty team
Рет қаралды 61 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Natural Language Processing: Tokenization (Basic)
20:22
Abhishek Thakur
Рет қаралды 9 М.
Byte pair encoding tokenization for geographical place names
21:32
Julia Silge
Рет қаралды 2,1 М.
Lecture 8: The GPT Tokenizer: Byte Pair Encoding
53:35
Vizuara
Рет қаралды 6 М.
Rasa Algorithm Whiteboard - BytePair Embeddings
12:45
How to Build a Bert WordPiece Tokenizer in Python and HuggingFace
31:20
The KV Cache: Memory Usage in Transformers
8:33
Efficient NLP
Рет қаралды 47 М.
310 - Understanding sub word tokenization used for NLP
32:16
DigitalSreeni
Рет қаралды 5 М.
Byte-pair encoding (BPE) (NLP817 2.6)
9:27
Herman Kamper
Рет қаралды 2,2 М.