Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space

No video

Embeddings Walkthrough (Part 2): Context-Dependent Embeddings, Shifting Embedding Space

Рет қаралды 523

Күн бұрын

We'll talk about how to make Transformer's next-token objective more in line to sentence meaning objective.
- Joint query and key similarity retrieval, e.g. Cohere Reranker
- Shifting embedding space via generating hypothetical documents or via hinting, e.g. Hypothetical Document Embeddings (HyDE), Recitation Augmented Language Models
- My Experiments to change context for embeddings: Pre-pending context, Appending context, Modifying text chunk by context
~~~
Part 1: • Embeddings Walkthrough...
Slides: github.com/tanchongmin/strict...
Jupyter Notebook for my experiments: github.com/tanchongmin/strict...
OpenAI Sentence Embedding Paper: arxiv.org/abs/2201.10005
Cohere Reranker: docs.cohere.com/docs/reranking
HyDE: arxiv.org/abs/2212.10496
Recitation Augmented Language Models: arxiv.org/abs/2210.01296
~~~
0:00 Issues with Sentence Embeddings
15:30 How Retrieval Augmented Generation (RAG) is typically done
17:29 Joint Query and Key Processing without External Embeddings
36:16 Hypothetical Document Embeddings (HyDE)
41:55 Guiding LLM by Hinting
50:00 My idea: Context-Dependent Embeddings
53:16 Key idea: Multiple embeddings by context modification
1:05:50 Discussion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: / discord
LinkedIn: / chong-min-tan-94652288
Online AI blog: delvingintotech.wordpress.com/
Twitter: / johntanchongmin
Try out my games here: simmer.io/@chongmin

Пікірлер: 9

@snehotoshbanerjee1938 5 ай бұрын

John, BTW, you are a great teacher!! Your teaching method is great specially at the end "Questions to Ponder" :)

@leonlysak4927 5 ай бұрын

Ben Geortzel discusses something more "old school" like Kernal PCA being the best way they've found to create node embeddings

@johntanchongmin 5 ай бұрын

53:16 - 1:05:50 Most important idea of Multiple Embeddings by changing text based on context!

@johntanchongmin 5 ай бұрын

Code for my experiments: github.com/tanchongmin/strictjson/blob/main/Experiments/Context-Dependent-Embeddings.ipynb

@johntanchongmin 5 ай бұрын

Part 1 here: kzbin.info/www/bejne/nYe9o6yuf7eXibs

@stalinsampras 5 ай бұрын

Dude Love your videos, Please do continue this type of exploration(coding) concepts videos. It will go well together with your academic paper explation/summarization videos

@snehotoshbanerjee1938 5 ай бұрын

John, one question... Is Cohere rerank algorithm uses embeddings behind the scene for semantic search and ranking? I guess embedding is necessary for semantic search. What I am confused is between "Embed each sentence and compare" vs "Put both document and Query to the algorithm"? Are these two approaches differs in creating two embedding vs one single embeddings space?

@johntanchongmin 5 ай бұрын

I believe the Cohere rerank model takes in both query and document, and outputs a score. This means embeddings for query/document need not be generated as it is a full end-to-end system. The normal embedding method works at a sentence level, and allows us to compare arbitrary sentences by cosine similarity. Cohere reranker you need to keep redoing this comparison for every different query and key you have.

@snehotoshbanerjee1938 5 ай бұрын

Ok. Thank u John!