Рет қаралды 703
How did Large Language Models (LLMs) become so good at capturing the essences of the text?
Key idea: Embeddings
Mark my words - embeddings and context-dependent embeddings are the key reason why transformers work so well!
We'll go through a historical overview of how embeddings are derived, starting from Bag of Words, to word2vec, to Transformers, and to some of my more recent experiments on context-dependent embeddings and multiple abstraction spaces!
~~~~
Part 2 here: • Embeddings Walkthrough...
My slides: github.com/tanchongmin/strict...
Bag of words: www.researchgate.net/publicat...
Word2vec: courses.cs.washington.edu/cou...
word2vec paper: arxiv.org/abs/1301.3781
Transformer paper: arxiv.org/abs/1706.03762
Vision Transformer paper: arxiv.org/abs/2010.11929
Memorising Transformer paper (for that nice token prediction visualisation): arxiv.org/abs/2203.08913
Text and Code Embeddings by Contrastive Pre-training (OpenAI embeddings paper): arxiv.org/abs/2201.10005
~~~~
0:00 Introduction
1:30 Bag of words
5:55 Continuous vectors for embedding
8:42 word2vec
20:24 Next-token prediction
24:43 Transformer embeddings
43:17 Comparison: Image token embeddings
51:52 Recap on Transformer embeddings
55:47 Cosine Similarity
59:42 Sentence Embeddings
1:07:39 Why Contrastive Learning is Bad
1:11:11 Mismatch between next-token prediction and sentence meaning embedding prediction
1:19:18 Insight: Multiple Abstraction Space Prediction for Embeddings
1:27:03 Discussion
~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: / discord
LinkedIn: / chong-min-tan-94652288
Online AI blog: delvingintotech.wordpress.com/
Twitter: / johntanchongmin
Try out my games here: simmer.io/@chongmin