Embeddings Walkthrough (Part 1) - Bag of Words to word2vec to Transformer contextual embeddings

Рет қаралды 703

Күн бұрын

How did Large Language Models (LLMs) become so good at capturing the essences of the text?
Key idea: Embeddings
Mark my words - embeddings and context-dependent embeddings are the key reason why transformers work so well!
We'll go through a historical overview of how embeddings are derived, starting from Bag of Words, to word2vec, to Transformers, and to some of my more recent experiments on context-dependent embeddings and multiple abstraction spaces!
~~~~
Part 2 here: • Embeddings Walkthrough...
My slides: github.com/tanchongmin/strict...
Bag of words: www.researchgate.net/publicat...
Word2vec: courses.cs.washington.edu/cou...
word2vec paper: arxiv.org/abs/1301.3781
Transformer paper: arxiv.org/abs/1706.03762
Vision Transformer paper: arxiv.org/abs/2010.11929
Memorising Transformer paper (for that nice token prediction visualisation): arxiv.org/abs/2203.08913
Text and Code Embeddings by Contrastive Pre-training (OpenAI embeddings paper): arxiv.org/abs/2201.10005
~~~~
0:00 Introduction
1:30 Bag of words
5:55 Continuous vectors for embedding
8:42 word2vec
20:24 Next-token prediction
24:43 Transformer embeddings
43:17 Comparison: Image token embeddings
51:52 Recap on Transformer embeddings
55:47 Cosine Similarity
59:42 Sentence Embeddings
1:07:39 Why Contrastive Learning is Bad
1:11:11 Mismatch between next-token prediction and sentence meaning embedding prediction
1:19:18 Insight: Multiple Abstraction Space Prediction for Embeddings
1:27:03 Discussion
~~~~~~~~~~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: / discord
LinkedIn: / chong-min-tan-94652288
Online AI blog: delvingintotech.wordpress.com/
Twitter: / johntanchongmin
Try out my games here: simmer.io/@chongmin

Пікірлер: 7

@snehotoshbanerjee1938 5 ай бұрын

knowledge packed and great insight on embeddings and it current scope of improvement.

@johntanchongmin 6 ай бұрын

Correction to 43:17 JFT-300M is 300M in size, ImageNet is 14M, so it is about 21x the size. The performance on ImageNet is about 88% zero-shot when pre-trained on JFT-300M However, the point still stands that Vision Transformers are inefficient, and requires a large dataset to overcome the lack of representational bias common for images like translational invariance/equivariance.

@johntanchongmin 6 ай бұрын

Key moments: 51:52 - 55:47 Recap on Transformer embeddings 1:07:39 - 1:11:11 Why Contrastive Learning is Bad 1:19:18 - 1:27:03 Insight: Multiple Abstraction Space Prediction for Embeddings

@johntanchongmin 6 ай бұрын

My slides: github.com/tanchongmin/strictjson/blob/main/Experiments/Embeddings%20Walkthrough.pdf Other useful resources: Bag of words: www.researchgate.net/publication/338511771_An_Overview_of_Bag_of_WordsImportance_Implementation_Applications_and_Challenges Word2vec: courses.cs.washington.edu/courses/csep517/20wi/slides/csep517wi20-WordEmbeddings.pdf word2vec paper: arxiv.org/abs/1301.3781 Transformer paper: arxiv.org/abs/1706.03762 Vision Transformer paper: arxiv.org/abs/2010.11929 Memorising Transformer paper (for that nice token prediction visualisation): arxiv.org/abs/2203.08913 Text and Code Embeddings by Contrastive Pre-training (OpenAI embeddings paper): arxiv.org/abs/2201.10005

@johntanchongmin 5 ай бұрын

Part 2 here: kzbin.info/www/bejne/j4u3hZuihcxjqLc

@seankruzel8889 6 ай бұрын

@johntanchongmin at around 1:23, you talk a lot about how sentence embeddings are not really solved, do you know of any work around entity embeddings? For example, how do you take the context dependent embeddings for the tokens of a word like "Michael Jordan" to create an entity-specific embedding?

@johntanchongmin 6 ай бұрын

Happy to hear if you know any. So far, the entity embeddings I know are the same as the sentence embeddings - just embed the entity directly