The Biggest Misconception about Embeddings

Рет қаралды 20,829

Күн бұрын

Пікірлер: 53

@shoaibsh2872 Жыл бұрын

It feels like the shorter your video is the more informative it is 😅, you don't only explain what's embedding is but also explain how it can differ based on problem statement in less than 5 minutes

@ritvikmath Жыл бұрын

Thanks! I’m trying to make shorter videos and learning that it can actually be more challenging than making a longer one

@johannestafelmaier616 Жыл бұрын

I'd say Quality > Quantity. Time is valuable and that is probably one reason why shorter from videos are becoming so successful. Also, I'd also say making shorter educational videos forces you to cut away everything that is not important, which should leave you with a clearer picture of what the essence of that concept is.

@xspydazx Жыл бұрын

reality , make a base model .. highly tuned and use this as your starting point for new models ... preserve you base at all costs... often online versions are poluted ...

@SierraSombrero Жыл бұрын

I've never commented on any of your videos before but thought it was time to do so after this one. Thank you so much for all the great work! For me you're the best explaining data science and ML concepts on youtube. I also love how broad your range of topics is. I feel like I used your content to understand concepts in NLP and general Data Science but also RL or Bayesian Approaches to Deep Learning. Your real life and intuition explanations are really strong. Keep it up!

@ritvikmath Жыл бұрын

Hey I really really appreciate the kind words and would absolutely love more comments and feedback in the future

@jfndfiunskj5299 Жыл бұрын

Dude, your videos are so damn mind-opening.

@ritvikmath Жыл бұрын

Thanks!

@Pure_Science_and_Technology 11 ай бұрын

In a RAG-based Q&A system, the efficiency of query processing and the quality of the results are paramount. One key challenge is the system’s ability to handle vague or context-lacking user queries, which often leads to inaccurate results. To address this, we’ve implemented a fine-tuned LLM to reformat and enrich user queries with contextual information, ensuring more relevant results from the vector database. However, this adds complexity, latency, and cost, especially in systems without high-end GPUs. Improving algorithmic efficiency is crucial. Integrating techniques like LORA into the LLM can streamline the process, allowing it to handle both context-aware query reformulation and vector searches. This could significantly reduce the need for separate embedding models, enhancing system responsiveness and user experience. Also, incorporating a feedback mechanism for continuous learning is vital. This would enable the system to adapt and improve over time based on user interactions, leading to progressively more accurate and reliable results. Such a system not only becomes more efficient but also more attuned to the evolving needs and patterns of its users.

@shirleyhu5446 4 ай бұрын

This is enlightening. It conveys how embedding works in an intuitive way.

@gordongoodwin6279 11 ай бұрын

this is a fantastic video. I found myself confused as to why NNs needed an embedding layer each time and why we didn't just import some universal embedding dictionary. This made that super simple! Parrots and carrots and kales and whales and cocks and rocks!

@baharrezaei5637 10 ай бұрын

best explanation I have seen of embeddings by far, Thanks 🌻

@polikalepotuaileva6006 4 ай бұрын

Excellent video. Thanks for taking the time to share.

@andreamorim6635 5 ай бұрын

Thanks for the explanation! Really easy to understand after watching this video!! keep up the good work

@ritvikmath 5 ай бұрын

Glad it helped!

@lechx32 Жыл бұрын

Would love to see more about embeddings

@ritvikmath Жыл бұрын

Noted! Thanks for the feedback

@nüchtern_betrachtet Жыл бұрын

I would like to point out an important distinction: The *concepts* described by the symbols in context of other symbols can have vastly different embeddings. The *symbols* themselves however need absolute/fixed embeddings. If you use multiple symbols in a sequence, like words in a sentence, you can use all the other symbols in order to give each other context. So the raw input embeddings are always the same. In that case, I would argue that the initial "common misconception" is actually accurate. Using a model like a transformer allows you to input a sequence of (fixed) symbol-embeddings and end up with contextualized embeddings in place of those symbols. The transformer then iteratively applies *transformations* on those embedding vectors depending on the *context* . The symbol "parrot" always starts as the same fixed embedding vector, no matter in which context it appears. But depending on the context, the repeated transformations done by the transformer eventually *map* that vector to another vector close to "parrot" if the context is a poem, or yet another vector close to "kale" if the context is a cooking recipe. This is why word2vec back then just was not enough. It only computed something similar to those input embeddings and then stopped there without doing those transformations.

@adaoraenemuo4228 Жыл бұрын

Love love your videos! Very clear with meaningful examples!

@randoff7916 Жыл бұрын

When the sample size is large, does the embedding for individual words start to converge?

@JoseWaihiga 3 ай бұрын

Thank you for a great explanation!

@zeroheisenburg3480 Жыл бұрын

One thing I don't understand is that why are these embeddings learned through deep learning with non-linearity in-between could be compared with linear metrics such as the most commonly used cosine similarity. I can't find a good discussion anywhere.

@SierraSombrero Жыл бұрын

The deep learning models are trained using non-linearities to capture non-linear relationships in the data. Hence, the function (=model architecture) you use to learn the embeddings has non-linearities. When we train a deep learning model to obtain an embedding, we most of the time have an embedding layer as the first layer in our model. We then train the model using a specific objective (goal), which is suitable to obtain word embeddings. After having trained the model, we just take the embedding layer out of the full model and discard the rest. You can imagine the embedding layer as a matrix of size (vocab_size x embedding_dimension). That means each word/token in our vocabulary is represented by a vector with as many numbers as the embedding dimension. The matrix (embedding layer) itself has no non-linearities, it's just a matrix. Therefore, the vectors that represent the tokens can be compared with each other using linear metrics as you said above. Hope it helps :)

@zeroheisenburg3480 Жыл бұрын

@@SierraSombrero Appreciate the response. But I think there's some critical issue lingering. 1. The input is a matrix. It goes through linear -> non-linear -> linear transformations. The back-propagation has to go through the same steps when updating the embedding layer's weight. So it's carrying non-linear information over to the embedding layer, thus breaking the linear properties, right? 2. By "The matrix (embedding layer) itself has no non-linearities", does that mean I can extract any weights before the activation unit in a neuron and use it as embedding?

@SierraSombrero Жыл бұрын

@@zeroheisenburg3480 I'll try to answer as best as I can. I'm not sure I'll be able to answer question 1 satisfactorily, though :) I guess I'll start with question because I can explain it better. 2. An embedding layer is not the same as a linear layer. It does not represent neurons and does not output activations (but rather representations). In a linear layer you have an input x that you multiply with the weight w and then you add a bias b. (I don't know of any case where weights have been used as embeddings.) An embedding layer can usually only be the first layer in a network. You don't multiply an input x with a weight w here. Instead you have a number of input classes in the form of integers (that represent e.g. words) that you can feed your model (the number of integers is your vocab size). Each of these input integers is mapped to one row in your embedding layer (vocab_size x embed_dim). You can imagine it like table where you look up which embedding belongs to which word. Once you have looked up the embedding for your current word, you use it as input to the next layer in your model. Now, before having trained your model the embedding is random and the embedding layer is updated during training using backprob just like every other layer (though differently because it is a different mathematical operation than a linear layer). After training the model, the embedding layer has been changed so that every of your inputs words now has a meaningful representation in the embedding space (if your training was successful). Now you can take the lookup table (embedding layer) out of your model, feed it a word and it will give you the meaningful embedding belonging to your word. I suggest you to check out the difference between Linear and Embedding layer in pytorch :) Make sure to understand what kinds of inputs you feed it and what you get as outputs. pytorch.org/docs/stable/generated/torch.nn.Linear.html pytorch.org/docs/stable/generated/torch.nn.Embedding.html Maybe also try to find a good explanation of how the first static embeddings were trained (CBOW, Skipgram). I think this should give you the intuition. 1. It's true that during training backpropagation also non-linear operations take place. However, since you're discarding all non-linear parts of the model and only keep the embedding layer, it is definitely possible in practice to apply linear operations on them. If there are theoretical mathematical issues lingering in the background then I'm certainly the wrong person to answer your question. But since it works so well in practice I would personally not worry too much about it :)

@blairt8101 Ай бұрын

I got a good grade for my ml exam because of you!

@turkial-harbi2919 Жыл бұрын

Simply beautiful 🙏

@ritvikmath Жыл бұрын

Thanks!

@Rafael-xu7jh 4 ай бұрын

What's the difference between embeddings and correspondence analysis for predictions, considering that both provide coordinates in an n-dimensional space based on the characteristics of categorical variables? If I can use embeddings for categorical variables in predictive models, why can't I use correspondence analysis?

@pratik.patil87 3 ай бұрын

This is amazing. Thank you for doing this video. This drives home a very imp. point. Can we fine tune the already present state of art embeddings to my specific world of context? Also, I would really like to know at least conceptually how some of these popular embeddings are created, like SBERT, RoBerTa etc.

@BlayneOliver 8 ай бұрын

How do you introduce categorical embeddings into a seq2seq model which works on sequence input_layers?

@MindLaboratory Жыл бұрын

I'm working on embeddings for a very particular application inside a game. Lots of natural language, but also lots of game-specific language. I started by downloading GloVe, find each word that appears in my vocabulary and in GloVe and copying that vector in my model for the matching word, and using a random vector for words that do not appear in GloVe. Then running an update function using a random sample of sentences each loop. Does this sound viable?

@Tonkuz 7 ай бұрын

What will happens to the embedding created for one LLM if I change the LLM

@mojekonto9287 6 ай бұрын

Nothing. At least in the context of a RAG system where you use the embeddings to search through their vector database to retrieve context for the LLM.

@jordiaguilar3640 Жыл бұрын

great teaching.

@zilaleizaldin1834 26 күн бұрын

you are amazing!

@user-wr4yl7tx3w Жыл бұрын

are there other options besides embeddings?

@garyboy7135 Жыл бұрын

Maybe some topics around word2vec and popular embedding method. And how embedding can expand beyond texts.

@xspydazx Жыл бұрын

really if you train your embedding model with entity lists and topic themed sentences , ie highly classified and enity rich data paired with its assocciated topic or entity. then you will build the right model. this modfel should form your base model and when performing tasks then you "fine tune the model" for the customized corpus so that it will no also update the vocabulary from your new corpus ... Re assigning the terms closer. to optimize then it would be neccasary to retrain for a set of epochs (without over fitting the new data ) as the pretrained model contains the data (YOU WANT UNDERNEATH) the new model is poluted to the new data corpus... hence keeping a base model unchanged to give your projects a jumpstart... tuning these models with new entity lists and topic lists etc .. updating the new knowledge in the model. even cleaning and pruning the vocabulary of stop words which are unwanted from the model , even offensive words and missassigned words... So a base model is your starting point . if you train a fresh model on the corpus then it will produce the results that you show.. it will essentailly not be fuit for purpose except the purpose it was trained for,,, ,

@Septumsempra8818 Жыл бұрын

Time series embedding? And encoders?

@ritvikmath Жыл бұрын

We’re getting there 😂

@micahdelaurentis6551 Жыл бұрын

that was a fantastic example

@ritvikmath Жыл бұрын

Thanks!

@cirostrizzi3760 Жыл бұрын

Greati video, very informative and clear. Can someone tell me some names of modern embeddings (e.g. OpenAI) and maybe give me some sources to search and understand more about them?