Build your own Q&A KnowledgeBot using GPT-Index & LangChain

Build your own Q&A KnowledgeBot using GPT-Index & LangChain - Document to Chatbot

Рет қаралды 34,331

Күн бұрын

Пікірлер: 181

@1littlecoder Жыл бұрын

Timestamps: 00:00 Introduction to Q&A KnowledgeBot 02:58 OpenAI Q&A Bot Demo 04:11 Full Code Explanation 05:27 OpenAI API Key Secret Token 07:21 Input Document Text File for the ChatBot 08:15 Constructing GPT Vector Index 14:00 Constructing Ask Bot Python unction 16:40 Running KnowledgeBot Q&A 18:21 Summary and Closure

@prkssngowthamdora Жыл бұрын

--->index = construct_index("/content/") RetryError: RetryError[

@samatech8853 Жыл бұрын

Please, is it possible reduce to the cost of request using gpt-index

@maertscisum Жыл бұрын

@1littlecoder Can you elaborate on how to do fine-tune for the incremental index, and how much tokens will be consumed? Also, is it possible to do the index fine-tune with the free ChatGPT UI?

@jersainpasaran1931 Жыл бұрын

fine tuning will be the future of employment.Thank you! you always surprise us with new news and your brilliant contributions.

@agcodes Жыл бұрын

Wdym by fine tuning, explain

@Davidkiania Жыл бұрын

This is hands down the best tutorial on leveraging existing conversational AI I have seen. Thank you very much.

@sevarakenjaeva7138 Жыл бұрын

congratulations, this was the only video I have seen where the code actually worked. thank you for being so precise

@sea_tu Жыл бұрын

I'm glad I found your KZbin channel when studying alone. Thanks to you, I am accumulating a lot of knowledge and experience.😄

@1littlecoder Жыл бұрын

Thanks so much that's very kind of you!

@CristiVladZ Жыл бұрын

Token cost optimization (especially when building a chatbot and having to account for all the prompts) is definitely the subject for a video! I'm struggling with this myself.

@abhisheksarkar4519 Жыл бұрын

Once again! Towards the end of the weekend, I found it to be little worthy just because of this. Thanks again Sir!

@1littlecoder Жыл бұрын

That's so so kind of you Abhishek. Thank you :)

@rajeshkannanmj Жыл бұрын

One of the best videos from you bro. I needed this. Actually, this is going to be future of Education system.

@1littlecoder Жыл бұрын

I appreciate that bro! Thank you!

@isaiahsgametube2321 Жыл бұрын

thats whats up! good way to start the morning

@1littlecoder Жыл бұрын

Morning Morning :)

@maertscisum Жыл бұрын

Can you explain more detail on how to do fine tune, especially with incremental index. It will also help to share how the tokens being consumed and how to optimise it.

@halkkoi1538 Жыл бұрын

You always have great idea and content

@santhanagopalan239 Жыл бұрын

Very well explained

@1littlecoder Жыл бұрын

Glad it was helpful!

@pleabargain Жыл бұрын

17:45 Feedback: It would have been a more impressive demo if you had shown a reference statement _in the book_ being pulled out by the algo. I was able to reproduce your results. I found that the algo would pick up specific references to certain individuals in the book BUT, the algo would also attribute certain phrases/things to other individuals as well. In other words, the algo is good but user beware! Thank you for posting!

@1littlecoder Жыл бұрын

Thank you. That's a great suggestion.

@PizzaLord Жыл бұрын

With the questions that you asked it at the end how do you know that the responses came from the data contained within the book and not a boiler plate response from Ada? It seems to prove that the learning went in successfully you would need to ask the bot something that is contained within the book that Ada previously knew nothing about.

@1littlecoder Жыл бұрын

Valid point. I assumed it was from the book. But I'll try to add some technical book and see.

@sup5356 Жыл бұрын

love it. Huge time saver

@1littlecoder Жыл бұрын

Glad you liked it!

@gitasuputra8371 Жыл бұрын

what is the maximum size to be fed into gpt index?

@KunjaBihariKrishna Жыл бұрын

I had surprisingly good results. There are some incorrect answers so I wouldn't expose it to our users. But the most interesting issue I encounter is that the bot will seemingly reference things that weren't mentioned, as though its reply is an excerpt from a documentation. Kinda makes sense, seeing as it's using our documentation. But it will say something like "as shown above" or "as outlined above". That string of words isn't exactly in the txt files. For other issues I was able to use the workspace search function in vscode to remove or replace certain strings in the txt and index.json I need to learn more about how this works so I can improve it. Once I have a full grasp I could potentially create a massive index by using our tech support history

@TheAstroengineer Жыл бұрын

Wonderful content. I followed this step to build my vector index with 33 input files. Now, I wanted to update my vector index with 10 more files. What should I do without running again for all the existing files?

@BMarck8 Жыл бұрын

I am also trying to do the same thing, have you found a way?

@satyagurucharan4455 Жыл бұрын

Can you help me with the steps, What to do when the content is web based. The QnA chatbot has to answer the question based on the content present in the given website. We are using llama2 7b and it is not giving accurate answers to the questions asked, the answers has to be from the website, but sometimes it gives additional information that is not part of the website. How would we fine tune and train, use RAG or what are the different API that can be called or trained API's. It would be helpful if you can share some suggestions or links where i can find the information.

@kashifimteyaz8931 Жыл бұрын

Can you pls make the extended video embedding the UI

@Analyse_US Жыл бұрын

Excellent tutorial.. Thankyou!

@1littlecoder Жыл бұрын

Glad it was helpful!

@AllexFerreira Жыл бұрын

Did I miss something I did not see you indexing the file any help?

@nganson673 Жыл бұрын

Are there any updated version ? some API are removed

@jeffm4941 Жыл бұрын

I cannot find the link to the co-lab site where the code is posted. The links above are to github. Can someone point me in the right direction?

@1littlecoder Жыл бұрын

This should help to import the GitHub in Colab kzbin.info/www/bejne/j2SWeICjebeMrdE

@drramasubramaniam6724 Жыл бұрын

Very helpful video. Thank you I was able to successfully run the code.

@marketingengagementhacks1659 Жыл бұрын

I don't understand what the purpose of the collab is. Is this a way for me to fine tune a model and then I can use that finetuned model in another context, like in openAI playground?

@vishalaiml1649 Жыл бұрын

@littlecoder, I have used your ipynb and tried purchasing OPEN AI API KEY, but it throws error in the last step - construct_index("/content") Error - > You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=. what should i do, i generated keys 2 or 3 times, it does not work

@AlgorithmicEchoes Жыл бұрын

I think this is one of the easiest to follow videos especially for beginners like me. Thanks for making it. Also, I have a request. Can you add how to host a chatbot using a web interface for others to access? Thanks!

@1littlecoder Жыл бұрын

Thank you, I'll try to create a ChatBOT UI next time

@pleabargain Жыл бұрын

10:00 I'm looking forward to when we will have LIBRARIES of LLMs. One for X. One for Z. e.g. LLM for cancer research. LLM for engineering. LLM for Python. etc.

@TRSTNGLRD Жыл бұрын

This is awesome - are you able to grow this series and show us how we can prevent hallucinations, improve complex results, and ask complex questions? I've built this bot but it is rarely good at giving well-rounded responses

@1littlecoder Жыл бұрын

I used ada here just to show the demo. Did you try with Ada model or Davinci ?

@saitej4808 Жыл бұрын

End to end also can be done by langchain right? it has data loader and vector index modules..

@madie_verse Жыл бұрын

can i use the model on my company'e technical documentation?

@klammer75 Жыл бұрын

What if I put three separate docs in pinecone for example, can you just pull the one doc out and query against that particular doc, or does it query from the db with all the docs and doesn’t know which doc in particular I’m talking about? If I ask who’s the author and there’s 20 papers in there would it give me a random sampling or all the authors? Thanks and these are great videos!

@nitincsawant Жыл бұрын

Those are some points like token optimization, saving the index to local are very helpful. Other question I may have is what type of the inputs it can deal with? I have couple of HTML files with tables in it. Can I pass it on directly to this LL model as a knowledge base? or does it need any kind of conversion to txt?

@shreyojitdas9333 Ай бұрын

sir do we have to buy credits for open api key it show rate limiter

@reneeliu1648 Жыл бұрын

Thank you for your video! Curious to know where does LangChain come into play in the code? It imported OpenAI but the code seems to only use the modules from Gpt index. I might be missing things here!

@AIwithAniket Жыл бұрын

Great video as usual! Love your background light and video quality. Which cam do you use?

@1littlecoder Жыл бұрын

Hey, I use iPad Camera :P and Background light is Philips RGB from Amazon

@arturko Жыл бұрын

great video! I've learned so much. Please follow-up with a UI video.

@1littlecoder Жыл бұрын

Will do, Thanks!

@superchiku Жыл бұрын

Question is how do you make this conversational and remember previous context?

@souvickdas5564 Жыл бұрын

How to merge multiple index files?

@AquaGymAthlete Жыл бұрын

hi great video, can we also add a conversation buffer memory to give our bot some context of the past conversation.

@anuu1582 Жыл бұрын

Thank you for the Tutorial, Great explanation. However when I try to execute this function "construct_index("/content/")" , getting this error "ValueError: chunk_overlap_ratio must be a float between 0. and 1." Could you please help advise here ?

@ManishSharma-gf4bw Жыл бұрын

Can I integrate this in Angular application? I have a chatbot in angular , so I would like to integrate this logic over there. Please reply

@fmarquesbh Жыл бұрын

Is the knowledge base saved in chatgpt forever or whenever I submit a new question do I need to submit the knowledge base? Is it create a model based on the knowledge base on the gpt chat side?

@jamespower2670 Жыл бұрын

Do many or even any websites contain a .txt file that could be crawled and would work for training as you described in the video?

@sujitable Жыл бұрын

trying this with some books but I get this error: TypeError: can only concatenate str (not "NoneType") to str any help is appreciated!

@Red-fu3gb Жыл бұрын

It is an excellent tutorial, thanks! But I ran into a problem. when I use my personal document specifying that SVA and DVA should be conducted every quarter(SVA and DVA is a kind of test), and ask the bot how often I should conduct NVA(another kind of test), the answer should be I don't know from the bot, but the bot answers every quarter. How to fine tune the model to answer I don't know when it can't find the answer in my personal document?

@skachoml Жыл бұрын

For some reason i am getting "AssertionError: The batch size should not be larger than 2048." when passing my own document. Maybe its too large and i dont know how to limit the batch size.

@SaintMeTube Жыл бұрын

Same error. Tried multiple settings (2048, 1024, 512) for max_input_size. No joy. 🙁

@Priyanka-js8zl Жыл бұрын

Thankyou for this detailed tutorial, so you asked 2 question from it, can you please share how much openai credits does it used in this complete operation? that would be really helpful

@TraveleroftheSoul7674 11 ай бұрын

I am building a chatbot. I have scrapped data from three different websites. But the tricky thing is I don't want to allow the chatbot to answer all query of a user if the user have not the same domain. Like if a user come from 1-website and asking questions about 1-website it's ok to answer the user but if the user asking questions about 2-website and 3-website then I don't want the bot to answer the query. Now how do I achieve this? In which formate should I feed the data into model so it will filter out to answer the user based on the user domain and won't provide details of other website.? I am not allow to build different bots. I have to build one bot only. Please , provide the details if any one work on this type bot.

@gowthamdora6146 Жыл бұрын

Instead of open ai paid model can we use free model from hugging face ?

@1littlecoder Жыл бұрын

This uses model from Hugging Face - kzbin.info/www/bejne/b4XbdoSHrttsmac

@KunjaBihariKrishna Жыл бұрын

EDIT: Nevermind, I slapped some gradio UI on it. I do have an ISSUE though: My Index.json only contains the first document in my docs folder. So I think your code assumes there is only one txt file. I was trying to follow along in VScode, but I got stuck at the part where you were asking a question. I suppose that google collab is running that somewhere. But I'm not sure how to do this from VScode. I can run the code, and it will give me some feedback about token usage for embedding. So it did process my txt files

@1littlecoder Жыл бұрын

so when you use the Directory Reader, does it still contain only one document in the index?

@KunjaBihariKrishna Жыл бұрын

@@1littlecoder It seems to contain more now. But I can see that not everything is in there. Then again, I don't really know how it's supposed to work. But if I use CTRL+F to look up some words that are in the txt files I provided, many of them are not in the index.json When I used the debugger, I could see that the documents variable did list all the txt filenames though For example: The word "South Africa" can be found over 40 times in the txt files. But it is not in the json, and when I ask the bot information it says it doesn't know. I don't understand exactly how the index file is being constructed. So I'm not sure if/what is going 'wrong'. Perhaps it is something to do with the token settings, and my txt amount is exceeding a limit? I'm not sure if these gpt_index or llm libraries are configured to handle token-limits in some way I have another script I made to summarize long meetings, and it involves counting the token amount of the full transcript, then cutting it in pieces with an overlap, to finally feed these pieces to chatgpt one by one, concatenating the responses, etc, etc But I'm not going to integrate that into this, unless I know for sure that this is the problem OBSERVATION: I don't know if something different happens when running with debugger. Or maybe the index is loading more data each time I restart? I'm sure it was not running while I added the data containing "South africa".. But after restarting the bot again, it now does contain that data" I wish I knew what is happening exactly

@lyricsdepicted5628 Жыл бұрын

If you ask the the third or the fourth question in the inference phase - does it take the former questions and answers as context or does it always start anew?

@1littlecoder Жыл бұрын

This doesn't have memory. It's all new. In the future we will try to implement memory to hold context

@ajitpawar3063 Жыл бұрын

Great video, can you please help in providing, how to build a UI interface over it, so that I can share with others?

@alizhadigerov9599 Жыл бұрын

after split of the document (the book), how many k's does it send to the LLM as a part of a prompt?

@aayushsmarten Жыл бұрын

Do GPT-Index (llamma-index) and Langchain provide exactly same functionalities? Or they both serve some different purpose? Can I use one of them or are they both for different tasks?

@codersaurabh Жыл бұрын

Nice one but are responses generative or it is just picking up lines from book

@1littlecoder Жыл бұрын

I checked a few and I could trace it back to the book. Not sure about everything.

@drramasubramaniam6724 Жыл бұрын

If im right temperature is set to 0 in the code. Meaning it will only generate extractive summary. If you set temperature close to 1, it will generate abstractive summary. But I feel we should be cautious on using abstractive, unless we have a very large corpus.

@RustuYucel Жыл бұрын

Is there anyway to use opensource free tokenization ? Costs are huge for simple test purposes even at the end via trial and error

@sathiyaviradhanj2125 Жыл бұрын

Bro, Can you clarify, When we use Openai api key and passing our company information such as company documents and question, Will not exposing to OpenAI ? How to handle it ?

@learnwithvichu Жыл бұрын

Hi Nanba. Is it possible to use bloom model instead of this model?

@1littlecoder Жыл бұрын

This tutorial uses FLAN - kzbin.info/www/bejne/b4XbdoSHrttsmac

@bevijay Жыл бұрын

Is it feasible to develop a knowledge-bot capable of processing a substantial volume of confidential documents, while ensuring that the model neither copies nor utilizes the data? Additionally, the data should remain within the customer data-center without any transmission outside of the said location.

@1littlecoder Жыл бұрын

This tutorial is for your request. This doesn't send any data outside expect the machine in which the model is trained - kzbin.info/www/bejne/b4XbdoSHrttsmac In this case, it's Google Colab

@bevijay Жыл бұрын

@@1littlecoder Thank you! How the FLAN model compares with LLaMA / Alpaca? I assume all three can run on our own GPUs.

@MicheleCaldarone Жыл бұрын

Thanks for your good explanation! I have an important question about the incremental fine tuning with new documents. You mention there is no need to train all again, but we can train new data over the previous trained data, but how? Have I only adds new .txt in doc folder and it understand if there's new data? Or have i do something different? Thanks so much!

@1littlecoder Жыл бұрын

Hey M, Just to be clear, I used the word fine-tuning loosely, My apologies. Here we are just building an index so no fine-tuning is happening to be honest.

@fahadrazzaq711 Жыл бұрын

@@1littlecoder what difference in building an index and fine-tuning?

@mauriciovalencia7300 Жыл бұрын

incredible job

@wingcao1924 Жыл бұрын

How can I adjust the robot to provide more output when answering questions? Every time I ask a question, I only receive incomplete answers.🙏

@wingcao1924 Жыл бұрын

When the program's output is in English, the returned answer is complete. When it's in Chinese, it's incomplete. Is it possible to adjust the code to solve this?

@jaedee6223 Жыл бұрын

Do you have any tips on generating your own Q&A document? I initially tried ingesting a document that had "Q: how to do X A: Do this by doing Y". I then saw it sometimes didn't answer it properly, so I changed the answer to "To do X, you must do Y". I then noticed that if I slightly reworded the question, but kept the same answer (in case people ask it differently), that it would give that answer on completely unrelated things. So I had to remove all the duplicate answers. I then noticed it would rarely answer with the "A: " in the reply, so I just got rid of all the questions. But now it won't answer some of the questions, and makes up completely false data, including snippets of various answers. So big hallucinations here too - not sure how to solve hallucinations.

@ketankhamgaonkar4344 Жыл бұрын

How can I embed this into a webpage or use as an api ?

@satpalsinghrathore2665 Жыл бұрын

Can we use this with ChatGPT in any way?

@krishnakompalli2606 Жыл бұрын

Great tutorial! I can see the response is more like a statement. How can we improve it to a conversational level? (I mean the response should be a question back to the user) Is it purely dependant on how the fine tuning is done?

@razexamvs8756 Жыл бұрын

am facing this issue 'GPTVectorStoreIndex' object has no attribute 'save_to_disk' pls help

@1littlecoder Жыл бұрын

Are you using llama index or gpt index?

@sevarakenjaeva7138 Жыл бұрын

Your colab with llama_index instead of gpt_index fails; did you test this again?

@1littlecoder Жыл бұрын

Is it the same error ?

@sevarakenjaeva7138 Жыл бұрын

@@1littlecoder good day, it works now; thank you for taking time to reply

@neeharika12 Жыл бұрын

can you convert this to voice

@QuangTuyenNguyen-zj7cz Жыл бұрын

Is there a fee to use the api key?

@mmcuser Жыл бұрын

I think that you are not including the promp_ helper after defining it

@1littlecoder Жыл бұрын

Thanks for the comment, can you elaborate a bit please?

@mmcuser Жыл бұрын

@@1littlecoderThe prompt helper config is not being applied since you are not including that variable anywhere! def construct_index(directory_path): prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit) llm_predictor = LLMPredictor(llm=OpenAI(temperature=0, model_name="text-davinci-002", max_tokens=num_outputs)) documents = SimpleDirectoryReader(directory_path).load_data() service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor) index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context) index.save_to_disk('index.json') This is what I've done: I've included it in the service_context def construct_index(directory_path): max_input_size = 1000 .... model = "text-ada-001" llm_predictor = LLMPredictor(llm=OpenAI(temperature=0.4, model_name=model, max_tokens=num_outputs,)) prompt_helper = PromptHelper(max_input_size, num_outputs, max_chunk_overlap, chunk_size_limit=chunk_size_limit) documents = SimpleDirectoryReader(directory_path).load_data() service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper) index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context) index.save_to_disk(f'static/docs/{cheap_model}.json') return index

@1littlecoder Жыл бұрын

Thanks very much, Very kind of you to share the code!

@younginnovatorscenterofint8986 Жыл бұрын

Hello, this was interesting. I am currently developing a chatbot with llama index model_name="text-ada-001 or davinci-003. So, based on thousands of documents (external data), the user will ask questions, and the chatbot must respond. When I tried it with just one document, the model performed well, but when I added another, the performance dropped. Could you please advise on a possible solution to this? thank you in advance

@seankhaila1061 Жыл бұрын

I am having an error on here index = construct_index("/content/"), I keep running into this

@seankhaila1061 Жыл бұрын

I am following along in collab and it doesnt seem like the .json file is being saved. Any ideas?

@huijiang7386 Жыл бұрын

@@seankhaila1061 it was saved to somewhere in colab, not your local drive. You can mount your google drive to colab and then it can be saved to your google drive. Do you still have error on index = construct_index("/content/")? I have the problem on this, ValueError: Only one of documents or index_struct can be provided. Do you have any idea? thanks!

@jameslay6505 Жыл бұрын

GPTSimpleVectorIndex doesn't seem to accept a parameter named "llm_predictor". What's going on, here?

@1littlecoder Жыл бұрын

are you using llama-index or gpt-index?

@jameslay6505 Жыл бұрын

@@1littlecoder gpt-index

@jameslay6505 Жыл бұрын

@@1littlecoder Ahh, I switched to llama-index and now it's working. Thank you!

@1littlecoder Жыл бұрын

@@jameslay6505 Great They recently went through this name change and rebranding and it's broken a lot of things. Glad you fixed the problem!

@isaachighlander8605 Жыл бұрын

@@1littlecoder this didn't fix my problem and I'm having the same one. could I be doing something wrong?

@dongcai6203 Жыл бұрын

Can you share your repo?

@mrmgflynn Жыл бұрын

Hi, I'm new to coding, so love the fact you provided a Colab notebook to use. Unfortunately I get an error: TypeError: __init__() got an unexpected keyword argument 'llm_predictor' when I run the index = construct_index("/content/") line. Do you have any clues about what I'm doing wrong?

@1littlecoder Жыл бұрын

Thanks Mark. It looks like something has changed in the code I need to check, give me ~24 hrs to check it and get back

@waelabou946 Жыл бұрын

@@1littlecoder Good morning ...first of all thanks for sharing this and the great effort .... i get the same error..

@1littlecoder Жыл бұрын

I fixed it, please check with the colab again

@gpligor Жыл бұрын

@@1littlecoder things change so fast! thanks for keeping this up to date

@szymonrachut7320 Жыл бұрын

I've got TypeError: can only concatenate str (not "NoneType") to str can you help me?

@1littlecoder Жыл бұрын

That's strange. Did you try with a different input file?

@szymonrachut7320 Жыл бұрын

@@1littlecoder what coding did you use?

@jorgerios4091 Жыл бұрын

Awesome, Is it possible to make a free version using GPT NEO?

@1littlecoder Жыл бұрын

It is. I'm trying to work on a video :)

@1littlecoder Жыл бұрын

This uses FLAN - kzbin.info/www/bejne/b4XbdoSHrttsmac

@jorgerios4091 Жыл бұрын

@@1littlecoder that is awesome! and the video is great, thank you very much.

@aiortairaan5458 Жыл бұрын

how do I make it run on my local machine? The program simply fails to run?

@1littlecoder Жыл бұрын

What error are you getting

@aiortairaan5458 Жыл бұрын

@@1littlecoder SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape, is the error I am getting. It is on the index = construct_index(directory name) line. Please help. If its possible can I email you my issue with photos attached to better understand the scenario?

@1littlecoder Жыл бұрын

That's strange. Send it to me 1littlecoder at gmail dot com

@aiortairaan5458 Жыл бұрын

@@1littlecoder sending it to you right away. Btw thanks a lot for understanding.

@concretec0w Жыл бұрын

That was AWESOME - THANK YOU!!! I've been after something like this for a while. But..... I'm not a fan of using Google stuff. Is there a way to covert the google colab into a code I can run on my machine? I'd love to see more AI content from all content creators without reliance on Google, I hope you agree and understand why! I get it though, OpenAI would still be getting the content I feed it via local project (i think?) I'm running Linux and have python, conda, etc. installed and can use that profficiently, I've just not touched colab yet so am wondering how I could go about converting colabs to local projects Hope you decide to teach this, but otherwise no worries - not everyone is as paranoid as me about privacy :)

@1littlecoder Жыл бұрын

Thank you for your perspective. I appreciate. Technically this code should work the same on local machine if you follow the instruction of changing the input file, directory path. In general I use Colab, it's easier to share with people but I got your point. That's definitely a valid concern relying on one platform especially Google.

@concretec0w Жыл бұрын

@@1littlecoder Thanks bud! I'll see if I can get that working :) Btw, do you know how much of the data we 'feed' in goes to OpenAI? I'm guessing all of it?

@1littlecoder Жыл бұрын

@@concretec0w This tutorial shows how to use open source models - kzbin.info/www/bejne/b4XbdoSHrttsmac but might not be as effective as OpenAI

@DanieldaCosta-j7l Жыл бұрын

please update your colob; it is out of date now; does not work anymore; i thought you should know

@LilyLi-vl6lv Жыл бұрын

Wouldn't this be insanely expensive with GPT3-davinci for long term use?

@LilyLi-vl6lv Жыл бұрын

And also what if you always had new data coming in? Then wouldn't you have to fine tune the model every single time, making this a cost increasing task? I believe it's not like something where you can fine tune a BERT model when already trained

@TahuRock Жыл бұрын

LEGENDDDD!

@1littlecoder Жыл бұрын

Thank you, Glad it was helpful

@simplegeektips1490 Жыл бұрын

Thanks a lot for another amazing video! Is there a way to do the same without open ai?

@sankhere5987 Жыл бұрын

Did you find the way?

@1littlecoder Жыл бұрын

@@sankhere5987 kzbin.info/www/bejne/b4XbdoSHrttsmac I just published it

@1littlecoder Жыл бұрын

I just published it to do the same with Hugging Face Models - kzbin.info/www/bejne/b4XbdoSHrttsmac

@lyricsdepicted5628 Жыл бұрын

What is a donation platform for you that doesn't take 50% from the donation that is meant for you?

@1littlecoder Жыл бұрын

Thank you for asking this. Thank you for the other reply as well. I'm glad the efforts are helpful. ko-fi.com/1littlecoder sends me 100% (it's then Paypal charges and Local Bank charges)

@1littlecoder Жыл бұрын

Thank you again :)

@SomuNayakVlogs Жыл бұрын

Hi , really its amazing , can you please reply how can we use file format other than text?

@1littlecoder Жыл бұрын

Technically you need to convert it to text, for example: PDF you can use PyPDF or some library to convert into text and use it with this..

@SomuNayakVlogs Жыл бұрын

@@1littlecoder yes doing now in that way, Thank you so much. Notebook was working fine but last 1 hour getting Internal server error when sending text data to construct_index method for finetune, can you please check that what could be the issue?

@shhossain321 Жыл бұрын

How to make it with bloom?

@1littlecoder Жыл бұрын

I will try to put together something

@1littlecoder Жыл бұрын

This uses FLAN - kzbin.info/www/bejne/b4XbdoSHrttsmac you can change the code for any model in Hugging Face

@drewwellington2496 Жыл бұрын

hi. good video, but you use "text-ada-001" to create vectors. why not use "text-embedding-ada-002"? this is their latest model which is 99% cheaper & can have tokens up to 8k in size?

@1littlecoder Жыл бұрын

Hey great question, We are actually not fine-tuning here. We're just indexing. I'll cover fine-tuning and using that embedding in the future and we'd use that embedding model for that.

@drewwellington2496 Жыл бұрын

@@1littlecoder oooooooooh ok i did not know there were different things (indexing vs fine-tuning/embeddings). i'll keep an eye out for any videos you do in the future about them. maybe a 30 second explainer for people like me who don't really know what they're doing or what the difference is 😃 keep up the great work, thanks again

@bingolio Жыл бұрын

It would be extremely useful to see llama.cpp models doing the same, with NO OpenAI components.

@1littlecoder Жыл бұрын

This is completely opensource without OpenAI. But it's not Llama - kzbin.info/www/bejne/b4XbdoSHrttsmac

@utsavtripathi858 10 ай бұрын

sir, Whatever you are teaching. No single Library is installing in colab SimpleDirectoryReader❌❌❌ gpt_index❌❌❌❌ How we can proceed ahead? please sir make separate video on it

@rajeshkannanmj Жыл бұрын

Can you spend more ideas on how to get factual replies from bot instead of random replies? For example, maybe you can try to embed a quiz pdf and let it learn. Then, you can ask question based on the quiz. Just a thought,

@KalebWyman Жыл бұрын

🤖✨🙏

@mendozaplata Жыл бұрын

pls put some crypto adress so i can send a tip

@PaulFishwick Жыл бұрын

Well done and interesting. I loaded the Colab notebook and in the 2nd cell, I get this error: ImportError Traceback (most recent call last) in () 1 #from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper 2 from langchain import OpenAI ----> 3 from llama_index import SimpleDirectoryReader, LangchainEmbedding, GPTListIndex,GPTSimpleVectorIndex, PromptHelper 4 from llama_index import LLMPredictor, ServiceContext 5 import sys ImportError: cannot import name 'GPTSimpleVectorIndex' from 'llama_index' (/usr/local/lib/python3.10/dist-packages/llama_index/__init__.py)