Converting a LangChain App from OpenAI to OpenSource

  Рет қаралды 15,771

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 69
@julian-fricker
@julian-fricker Жыл бұрын
This is exactly why I'm learning langchain and creating tools with data I don't care about for now. I know one day I'll flick a switch and have the ability to do all of this locally with open source tools and not worry about the security of my real data. This is the way!
@vhater2006
@vhater2006 Жыл бұрын
Good Luck on Privacy ;)
@robxmccarthy
@robxmccarthy Жыл бұрын
Thank you so much for doing all of this work! Would be really interesting to compare the larger models. If GPT-3.5-Turbo is based on a 176b parameter model, it's going to be very difficult for a 13b model to stack up. 13b models seem more appropriate for fine tuning, where the limited parameter count can be focused on specific context and domains - such as these texts and a QA structure for answering questions over the text. The example QA instructions and labels could be generated using OpenAI to ask questions over the text as in your first example. This is all very expensive and time consuming though... So I think you'd really need a real world business use case to justify the experimentation and development time required.
@Rems766
@Rems766 Жыл бұрын
Mate you are doing all the work I planned, for me. Thanks a lot.
@jarekmor
@jarekmor Жыл бұрын
Unique content and format. Practical examples. Something amazing! Don't stop making new videos.
@tensiondriven
@tensiondriven Жыл бұрын
This might be trivial; but I’d love a video on the difference between running a notebook and running a cli vs api. All the demos use notebooks, but to make it useful we need apis and cli!
@theh1ve
@theh1ve Жыл бұрын
I'd like to see this too. I want my model inferences running on one network machine and a GUI running on another with API calls.
@georgep.8478
@georgep.8478 Жыл бұрын
This is great. Please follow up on fine tuning a smaller model on the text and epub
@PleaseOpenSourceAI
@PleaseOpenSourceAI Жыл бұрын
Great job, but these HF models are really large - even 7B ones take more than 12Gb of memory, so can't really run them on local cuda core. I'm almost at the point of beginning to try to figure out how to use GPTQ models for these purposes). It's been a month already and seems like no one is doing it for some reason. Do you know if there is some big obvious roadblock on this path?
@clray123
@clray123 Жыл бұрын
I think it will get interesting when people start tuning these open source models with QLoRa and some carefully designed task-specific datasets. If you browse through the chat-based datasets these models are pretrained with, there's a lot of crap in there, so no wonder the outputs are not amazing. I believe the jury is still out to what extent a smaller finetuned model could outperform a large general one on a highly specialized task. Although based on the benchmarks of the Guanaco model family, it seems that the raw model size also matters a lot.
@pubgkiller2903
@pubgkiller2903 Жыл бұрын
Biggest drawback is QLoRA will take a long time to generate the answer from Context
@tejaswi1995
@tejaswi1995 Жыл бұрын
The video I was most waiting for on your channel 🔥
@thewimo8298
@thewimo8298 Жыл бұрын
Thank you Sam! Appreciate the guide with the non-OpenAI LLMs!
@DaTruAndi
@DaTruAndi Жыл бұрын
Can you look into using the quantized models (GPTQ 4 bit or GGML 4.1) for example with langchain?
@acortis
@acortis Жыл бұрын
This was very helpful! Thanks so much for doing these videos. May I suggest that you do a video on the things that are needed to fine-tune some of the LLMs having a specific goal in mind? not sure that this is something that can be done on a colab, but knowing what are the steps and the required resources might be very helpful. Thanks again!
@samwitteveenai
@samwitteveenai Жыл бұрын
I will certainly make some more fine-tuning vids. anything good examples you mean by "having a specific goal in mind"?
@acortis
@acortis Жыл бұрын
​@@samwitteveenai I saw your video on fine-tuning with PEFT on the English quotes, and I thought the final result was a bit of a hit-and-miss. I was wondering what specific type of datasets would be needed for, say, reasoning or data extraction (a la squadv2). Overall, I have the sense that LLMs are trying to train on too much data (why in the world we are trying to get exact arithmetic is beyond me!). I think that it would be more efficient if there was a more specific model just dedicated to learning English grammar and then smaller, topic-specific, models. Just my gut feeling.
@samwitteveenai
@samwitteveenai Жыл бұрын
@@acortis This is something I am working on a lot. The PEFT task was partially due to me not training it very long, it was just to give people something they could use to learn on. Reasoning is a task that normally requires bigger models etc. for few shot tasks. I am currently training models around 3B for very specific types of tasks around ReACT and PAL. I totally agree about the arithmetic etc. what I am interested in though is models that can do the PAL tasks etc. I have a video on that from about 2 months ago. I will make some more fine tuning content. I want to show QLoRA and some other cool stuff in PEFT as well.
@fv4466
@fv4466 Жыл бұрын
As a new comer, your discussion on the difference among models and prompt tuning is extremely helpful. Your video pins down the shortcoming of the current Retrieval-Augmented Language Modeling. It is very informative. Is any good way just to digest the html as raw? Is it always better to convert the html pages to text and following your process described in your video? Any tools do you recommend?
@reinerheiner1148
@reinerheiner1148 Жыл бұрын
I've really wondered how open source models would perform with langchain vs gpt 3.5 turbo so thanks for making that video. I suspected that the open source models would probably not perform as well but I did not think it would be that bad. Could you maybe provide us with a list of LLM's you tried that didnt work out, so we can cross them off our list of models to try for langchain? In any case thanks for making this notebook, it'll make it so much easier for me to mess around with open source models and langchain!
@yousif_12312
@yousif_12312 Жыл бұрын
Is it optimal to pass the user query to the retriever directly? Wouldn't asking the language model to decide what to search for (like using a tool) be better? Also, if 3 chunks in 1 doc were found, I wonder if its better to order them sequentially as they show up in the doc..
@ygshkmr123
@ygshkmr123 Жыл бұрын
Hey Sam, Do you have any idea how can reduce inference time on open-source LLM model
@samwitteveenai
@samwitteveenai Жыл бұрын
Multiple GPUs, Quantization, Flash attention and other hacks. I am thinking about doing a video about this . Any particular model you are using ?
@ЕгорГуторов-р7я
@ЕгорГуторов-р7я Жыл бұрын
Thank you for such content. Is there any possibility to do the same without using only cloud-native platform and GPU? If I wanna launch smth similar on-premises with CPU?
@rudy9546
@rudy9546 Жыл бұрын
Top tier content
@creativeuser9086
@creativeuser9086 Жыл бұрын
Could you please point me to a video you’ve done abouy how the embedding model works? Specifically, I want to know how does it transform a whole chunk of data (paragraph) into 1 embedding vector (instead of multiple vectors per token)?
@henkhbit5748
@henkhbit5748 Жыл бұрын
Great video, love the comparison with open source. Would be Nice if u can show how to fine tune an os model, small model, with your own instruct dataset. BTW: how to add new embeddings in a currrent chroma DB? DB.sdd(....)?
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
Which LLM is instruct embeddings compatible with? Is it a common standard?
@samwitteveenai
@samwitteveenai Жыл бұрын
It will work with any LLM you use for conversational part. Embedding models are independent of the conversation LLM, they are for retrieval.
@Borbby
@Borbby Жыл бұрын
Hello, thank you for the great work ! I have a confusion about tokenizer and LLM, can they use the same model, like at 11:00 in the video, or can I use another model? Is there any difference between them?
@creativeuser9086
@creativeuser9086 Жыл бұрын
Fine tuning is hard. But RLHF is what takes the model to the next level and on par with the top commercial models. Wanna try to do it?
@samwitteveenai
@samwitteveenai Жыл бұрын
RLHF isn't the panacea that most people make it out to be it. I have tried it for some things. I will make a video about it at some point.
@creativeuser9086
@creativeuser9086 Жыл бұрын
@@samwitteveenai I guess RLHF is hard to implement and is still in research territory.
@darshitmehta3768
@darshitmehta3768 Жыл бұрын
Hello Sam, Thank you for this amazing video. I am also facing issue for open source model like the same way in video. The open source models are giving answers them self if the data is not present in the PDF or chromadb. Are you having idea how can we achieve thing like openai for open source and which model we can use for that?
@creativeuser9086
@creativeuser9086 Жыл бұрын
Can you try with the falcon-40B ?
@pranjuls-dt1sp
@pranjuls-dt1sp Жыл бұрын
Excellent stuff!!🔥🔥 Just curious to know, is there a way to extract unstructured information like invoice data extraction / receipt labels / medical bills info description etc. Using open source LLMs? Like using langchain + wizard/vicuna to perform such nlp tasks?
@samwitteveenai
@samwitteveenai Жыл бұрын
you can try the Unstructed package or something like an open source OCR model
@DaTruAndi
@DaTruAndi Жыл бұрын
Wouldn’t it make more sense to chunk tokenized sequences instead of the untokenized text? You don’t know the length of the tokenizations of each chunk, but maybe you should. Also handling of special sequences, like ### Assistant, would they be represented as special tokens? If so, handling them in the token space eg as additional stop tokens for the next answer may make sense.
@samwitteveenai
@samwitteveenai Жыл бұрын
Yes, but honestly most the time it doesn't matter than much. The tokens way is a perfectly valid way to do it but here I was trying to keep it simple. You can use fancier ways for things like interviews. I have one project that has one set of docs that are financial interviews where I took the time to write a custom splitter for question / answer chunks and it certainly helps. Another challenge with custom open source models too are the different tokenizers. Eg. the original LLaMA models have a 32k vocab tokenizer but the fully open source ones are using 50K+ etc. So we want to make the indexes once but test them on multiple models. So in cases like this using token indexing doesn't always help too much. Often the key thing is to have a good overlap size and that should be tested
@bobchelios9961
@bobchelios9961 Жыл бұрын
i would love some information on the RAG models you mentioned near the end
@vhater2006
@vhater2006 Жыл бұрын
Hello Thank your for sharing , so if i want to use langchain and HF "Just Open" a pipelines finally it get it .why not using big models from HF on you example a 40b 65b to get "better" results ?
@samwitteveenai
@samwitteveenai Жыл бұрын
mostly because people won't have the GPUs to serve them. Also HF doesn't serve most the big models for free on their API
@rakeshpurohit3190
@rakeshpurohit3190 Жыл бұрын
Will this be able to give insights into the given doc like writing pattern, tone, language etc?
@samwitteveenai
@samwitteveenai Жыл бұрын
It will use the those from the docs and you can set those in the prompts
@dhruvilshah9881
@dhruvilshah9881 Жыл бұрын
Hi, Sam. Thank you for all the videos - I have been with you from the first video. Learned so much from these tutorials. Can you create a video on Fine Tuning LLaMA/Alpaca/ VertexAI(text-bison) or any other feasible LLM for retrieval purposes? Retrieval purposes could be - 1) Asking something about the private data (in GBs/TBs) on local repository. 2) Extracting some specific information from the local data.
@samwitteveenai
@samwitteveenai Жыл бұрын
Thanks for being around from the start :D. I want to get back more in to showing Fine-tuning especially now the truly open LLaMA models are out. I try to show something that people can run in Colab so probably won't do TBs of data. Do you have any suggested datasets I could use?
@cdgaeteM
@cdgaeteM Жыл бұрын
Thanks, Sam; your channel is great! I have developed a couple of APIs. Gorilla seems to be very interesting. I would love to hear your opinion through a video. Best!
@samwitteveenai
@samwitteveenai Жыл бұрын
Yes Gorilla does seem interesting I read the abstract a few days ago, need to go back and check it out properly. Thanks for reminding me!
@HimanshuSingh-ov5gw
@HimanshuSingh-ov5gw Жыл бұрын
How much time would this e5 embedding model take to embed large files or larger no. of files like 1500 text files?
@samwitteveenai
@samwitteveenai Жыл бұрын
1500 isn't that large, on a decent GPU probably looking at 10s of mins max. Probably a lot shorter depending on each file length. Of course once indexed just save them to use in the future etc.
@HimanshuSingh-ov5gw
@HimanshuSingh-ov5gw Жыл бұрын
@@samwitteveenai Thanks! Btw your videos are very helpful!
@adriangabriel3219
@adriangabriel3219 Жыл бұрын
What dataset would you use for fine-tuning?
@samwitteveenai
@samwitteveenai Жыл бұрын
Depends on the task. Mostly I use internal datasets for fine tuning.
@123arskas
@123arskas Жыл бұрын
Hey Sam, awesome work. I wanted to ask you something: 1- Suppose we've a lot of call transcripts of multiple agents 2- I want to summarize the transcripts of a month (lets say January) 3- The call transcripts can be from 5 to 600 in a month for a single agent 4- I want to use GPT-3.5 models not the other GPT models. How would I use LangChain to deal with that much amount of data using Async Programming? I want the number of Tokens and number of Requests to OpenAI API to be below the recommended level so nothing crashes. Any place where I can learn to do this sort of task?
@samwitteveenai
@samwitteveenai Жыл бұрын
Take a look at the summarization vids I made and especially the map_reduce stuff so that would do lots of small summaries which you can then do summaries of summaries etc.
@123arskas
@123arskas Жыл бұрын
@@samwitteveenai Thank you
@user-wr4yl7tx3w
@user-wr4yl7tx3w Жыл бұрын
Have you tried the Falcon LLM model?
@samwitteveenai
@samwitteveenai Жыл бұрын
Yes Falcon7B was the original model I wanted to make the video with but it didnt work well.
@kumargaurav2170
@kumargaurav2170 Жыл бұрын
The kind of understanding wrt what user is exactly looking out for is currently best performed by OpenAI & PaLM APIs between all the hype.
@samwitteveenai
@samwitteveenai Жыл бұрын
Totally agree. Lots of people are looking for open source models and it can work for certain uses, but GPT3/4, Palm Bison/Unicorn and Claude are the ones that work the best for this kind of thing.
@alexdantart
@alexdantart Жыл бұрын
Please, tell me your Collab environment... even in Collab Pro I get: OutOfMemoryError: CUDA out of memory. Tried to allocate 288.00 MiB (GPU 0; 15.77 GiB total capacity; 14.08 GiB already allocated; 100.12 MiB free; 14.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@samwitteveenai
@samwitteveenai Жыл бұрын
I usually use an A100. You will need the Colab Pro+ to run on Colab
@andrijanmoldovan
@andrijanmoldovan Жыл бұрын
Would this work with "TheBloke/guanaco-33B-GPTQ" 4-bit GPTQ model for GPU inference(or other GPTQ model)?
@samwitteveenai
@samwitteveenai Жыл бұрын
possibly but would need different loading code etc.
@pubgkiller2903
@pubgkiller2903 Жыл бұрын
Thanks Sam …. It’s great. Would you please implement the same concept with Falcon ?
@samwitteveenai
@samwitteveenai Жыл бұрын
I did try to do the video with Falcon7B but the outputs weren't that good at all.
@pubgkiller2903
@pubgkiller2903 Жыл бұрын
@@samwitteveenai one question, are these big models like Falcon, Stable Vicuña etc can work on windows laptop on Jupyter Notebook? Or they require Unix system only?
@fv4466
@fv4466 Жыл бұрын
@@samwitteveenai Wow! I thought it was highly praised.
Using LangChain Output Parsers to get what you want out of LLMs
23:04
OpenAI's Swarm - a GAME CHANGER for AI Agents
20:48
Cole Medin
Рет қаралды 45 М.
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН
ТЮРЕМЩИК В БОКСЕ! #shorts
00:58
HARD_MMA
Рет қаралды 2,3 МЛН
这是自救的好办法 #路飞#海贼王
00:43
路飞与唐舞桐
Рет қаралды 138 МЛН
Chat with Multiple/Large SQL and Vector Databases using LLM agents (Combine RAG and SQL-Agents)
1:36:39
LLaMA2 Tokenizer and Prompt Tricks
13:42
Sam Witteveen
Рет қаралды 16 М.
Understanding ReACT with LangChain
21:10
Sam Witteveen
Рет қаралды 50 М.
RAPTOR - Advanced RAG with LangChain
15:43
Coding Crash Courses
Рет қаралды 10 М.
Local GraphRAG with LLaMa 3.1 - LangChain, Ollama & Neo4j
15:01
Coding Crash Courses
Рет қаралды 28 М.
Don’t Embed Wrong!
11:42
Matt Williams
Рет қаралды 13 М.
Trick-or-Treating in a Rush. Part 2
00:37
Daniel LaBelle
Рет қаралды 46 МЛН