Very excited it's for Flash too! This'll help a lot at work for certain features!
@deeplearning70972 ай бұрын
Brilliant Sam, as always, thank you very much.
@SwapperTheFirst2 ай бұрын
I can immediately see how to use it for quite cheap similarity search. Assuming that you have 1M strings to match, you can put all of them into context window and then ask model each time to find a similar string. Though it will be quite long - it will be not expensive with caching (storing tokens in RAM or on SSD). But this is not scalable to 1B strings. RAG approach is also not possible/feasible. Sam, maybe you have some advice on how to solve a similarity search on a scale? On smaller scale you can solve this character-wise, using rapidfuzz or dedupe. But how you can solve it on a scale? This business problem is known as "entity matching" or "fuzzy entity matching". For example, you want to match "Microsoft corp" to "Microsoft corporation" to "MSFT". Also you want to cluster similar strings under the same unique umbrella - "Microsoft corporation" in the example. You can also use regular vector search, but the problem with clustering... How to "shuffle" thru 1B rows to create a reliable index and then make it updateable in real-time.
@samwitteveenai2 ай бұрын
Very interesting comment. AFAIK the most commonly used (combining accuracy, cost and efficiency) models for entity matching are encoder based Bert/RoBerta style models. LLMs can certainly do it they just end up being slow. It would be interesting to see if you could do it with a long context model and perhaps store the list of entities in the prompt and only have give new ones as the output. The challenge is it would still be way to slow for anything real time. This is an interesting challenge, let me think about it a bit more and look for a dataset to test on.
@JayanaKalansuriya2 ай бұрын
Hi if possible would love to get some advice from you guys! There’s a requirement where I have a master catalog of products with of over 5000 products with images and product text I want to build a similarity matching solution, in which if i upload an image and if there’s a similar product in the master catalog, I want to find that out Basically we need to do a image similarity mapping and the most similar image that matches with the input image should be shown How can I work on this? Any advices would be much appreciated!
@SwapperTheFirst2 ай бұрын
@@JayanaKalansuriya with the scale of 5K or so you don't need anything complex. Just grab some multi-modal (visual is enough) embedding model and then organize a vector search. It is just a regular RAG app, with some small twist, that you create embedding for images as well and also will serve images in the RAG response. Or use some managed solution from Google Cloud.
@SwapperTheFirst2 ай бұрын
@@samwitteveenai Thanks a lot, Sam. You're right about Bert/Roberta and these are used in spacy "transformer" model for Named entities extraction. And spacy always is using the latest and greatest, unlike NLTK, which is very conservative and has lots of legacy stuff for compatibility.
@eddiehaug2 ай бұрын
@@JayanaKalansuriya - Not sure what your specific use case is, but you might want to take a look at Google's recommendations AI, and there's a Ranking API as well.
@danangjeffry2 ай бұрын
Very useful and easy to understand. Thank you!
@rluijk2 ай бұрын
Thanks! Nice explainer. Will integrate this part in my setup!
@GamingClubGermanyАй бұрын
First of thanks a lot for the video! but why is your voice/the noise so wobbly/? Are you using a TTS model or something like this? update: Okay i dont know what you use but its pretty awesome! do you mind sharing infos on what voice "thing" you use?
@leslysandra23 күн бұрын
thank you for sharing :D
@gen_ai_explorer2 ай бұрын
How this will benefit? As we can store the information in vector db and use only the relevant chunks at a time right? how does the google caching will help us?
@matty-oz6yd2 ай бұрын
Any idea how this works? I am trying to work out whether to use an index of relevant context or use the context cache feature. It seems like the details are a closely guarded secret which would mean the only way for me to decide between the two would be to test both lots. The use cases seem to be very similar Option 1 - Give google a bunch of context, hope that it's good and then run queries against it Option 2 - index my context and add information as needed using RAG The RAG approach would lead to more tokens being used but at least I know how it works so can set my expectations for how it works. The google approach would be cheaper but I don't know how the context has been processed. I cant intentionally format my data for optimal performance.
@Maisonier2 ай бұрын
I'd love to know how to do that with openwebui and a local model in a single GPU. Do we need to use FAISS or what RAG?
@IdPreferNot12 ай бұрын
A great video example of this would be demonstrating processing a repo or API docs to help with programming code where the libraries have significantly changed since cutoff date. I still cant believe that gpt-4o cant get the endpoints and structure right for coding with its own current OAI API when you ask it to build code to work with the gpt.
@KishanLal-s4kАй бұрын
Is there a way to update the content of the cache? I could see it's just limited to TTL but unable to update the actual content of the cache.
@gemini_5372 ай бұрын
This is super useful! ❤
@ylazerson2 ай бұрын
great video - thanks!
@miriamramstudio39822 ай бұрын
Very useful. Thanks
@RD-learning-today2 ай бұрын
how to use it in Vertex-AI ?
@johnrperry5897Ай бұрын
Wait what is cayche
@SrikanthCSE-mi9jm2 ай бұрын
How do i use it with langchain?
@samwitteveenai2 ай бұрын
I am not sure if they support this yet or not. my guess is the google-langchain package and vertex langchain package will have to give it support.
@darshank87482 ай бұрын
Great video!
@RD-learning-today2 ай бұрын
can i use it with vertex ai?
@TheRcfrias2 ай бұрын
I thought this video was about caching client side to avoid passing around huge payloads for function calling and so 😢
@guanjwcn2 ай бұрын
Thank you, Sam!! Does llama3 have this too?
@samwitteveenai2 ай бұрын
I think if you served it with vLLM you could do it that has prefix caching
@WillJohnston-wg9ew2 ай бұрын
Any thoughts on how this would apply to real time video? I am trying to create something that does real-time video sentiment analysis.
@samwitteveenai2 ай бұрын
Real time probably wouldn't work just yet. There are some hacks/techniques they have to do it realtime with Project Astra etc, but I am not sure when that will be available to us externally.
@MistikBBQ2 ай бұрын
Any way of doing this locally with something like ollama? This would actually be amazing to use in some local/edge case
@samwitteveenai2 ай бұрын
you could do it with a local model via vLLM. AFAIK currently not possible in Ollama, but they certainly could add it
@MistikBBQ2 ай бұрын
@@samwitteveenai Thanks a lot for the reply!
@mrnakomoto72412 ай бұрын
aussie accent living in usa whats going on
@samwitteveenai2 ай бұрын
actually back living in Singapore for the time being 😀
@mrnakomoto72412 ай бұрын
@@samwitteveenai out of curiosity Do you own a house in every country you go?
@micbab-vg2mu2 ай бұрын
Great:)
@ahmaddajani36392 ай бұрын
why use context like this instead of using vector store and chunk the content?
@SwapperTheFirst2 ай бұрын
you will not get this with a vector store and RAG. Here you have all content availabe for the model.
@eddiehaug2 ай бұрын
Because depending on the use case, you may wanna use one technique vs the other. Adding all the info as context to an LLM is not the same as using RAG where your results may vary greatly depending on the chunk size, the ranking engine, etc.
@ahmaddajani36392 ай бұрын
@@eddiehaug Yes correct it depends on the use case, but if you want to save money in case of question answering, RAG is better.
@eddiehaug2 ай бұрын
@@ahmaddajani3639 - yes, agree 👍
@vicovico2 ай бұрын
What's going on with the pronunciation of "cached"?
@ScottVanKirk2 ай бұрын
It is pronounced Kash. The e is vestigial like our appendix😁
@jamiek20392 ай бұрын
😂
@matthewwalker70632 ай бұрын
Engagement baiting
@samwitteveenai2 ай бұрын
lol I was waiting for someone to say something 😀
@ariganeri2 ай бұрын
It's what Aussies call English.
@otty40002 ай бұрын
functionly isnt this quick similar to notebookllm
@samwitteveenai2 ай бұрын
no this is more than just upload the docs/video etc it is having a lot of the values precomputed in the model