LangChain 101: Ask Questions On Your Custom (or Private) Files + Chat GPT

Рет қаралды 119,120

Күн бұрын

Пікірлер: 372

@adityalakkad8795 Жыл бұрын

I would love to see a video where I can build same thing with open source alternative tech, being a student that would be very much helpful to build some capstone projects.

@DataIndependent Жыл бұрын

Sounds good! I'll add that to the list.

@chrisriley6264 Жыл бұрын

@@DataIndependent If you switch to open source I'm fairly certain you will increase access and grow faster as a content creator. I only use only source tools and will wait for someone to create an alternative. I tend to seek out creators that use these tools by default. It's good to know about the tech of course but if I can't own it I don't use it daily and don't usually recommend it unless it's business related.

@konstantinlozev2272 Жыл бұрын

@@chrisriley6264 There is PrivateGPT with ChromaDB and Vicuna LLM. Fully local and free. But it's not too impressive ATM on its own. I think with some tweaking and optimisation, it will become useful.

@hugomejia7826 Жыл бұрын

Great series about LangChain a must see!

@DataIndependent Жыл бұрын

Nice thanks!

@liamboyd4676 Жыл бұрын

Great video! I would love to see some in-depth walk through of the various chain types and agents, along with examples to help clarify usage. Thank you.

@tsukinome8153 Жыл бұрын

An amazing video, I learned a lot thanks to this. LangChan's documentation is very difficult for me to understand, but thanks to your video I feel much more confident to start experimenting with this library.

@DataIndependent Жыл бұрын

Awesome, I'm glad it worked out. Let me know what other videos or questions you have

@arigato1901 Жыл бұрын

Excellent video, thank you so much! I have concerns about data privacy tho. It would be great to do the same with another model that could run locally, like alpaca... would it be possible?

@abhishekjayant9733 11 ай бұрын

The underrated ninja of the AI space. Amazing content and explanation. VERY USEFUL.

@DataIndependent 11 ай бұрын

Nice! Thank you Abhishek!

@kuntalpcelebi2251 Жыл бұрын

I have got a better answer :D "' In 1960, John McCarthy published a remarkable paper in which he showed how, given a handful of simple operators and a notation for functions, you can build a whole programming language. He called this language Lisp, for "List Processing," because one of his key ideas was to use a simple data structure called a list for both code and data.'"

@JohnMcGuin Жыл бұрын

This is such a good combination of accessible and technical. Thanks so much for what you're doing! As a technical person very new and exploring this space, your content is so so helpful

@DataIndependent Жыл бұрын

Nice I love it! Thank you!

@danielsommer6655 Жыл бұрын

Super helpful! Thanks so much. It looks like OpenAI made a change and depreciated VectorDBQA and suggested RetrievalQA.

@DataIndependent Жыл бұрын

Langchain made that change and yep that is what they recommend now!

@danielsommer6655 Жыл бұрын

@@DataIndependent Any suggestion for how to update the jupyter notebook? Thanks!

@mlg4035 Жыл бұрын

Freaking awesome video, all stuff, no fluff!! This will help me get started quickly, thank you so much!

@sun19851 Жыл бұрын

Great video Greg, thank you!! The VectorDBQA method is already deprecated, use below chat = RetrievalQA.from_llm(llm=OpenAI(), retriever=docs.as_retriever())

@fernandofrias8322 Жыл бұрын

I asked this to Langchain helper chat AI, this is the answer: RetrievalQA.from_llm and VectorDBQA are both question answering models in the Langchain library. RetrievalQA.from_llm is a chain that uses a large language model (LLM) as the question answering component. It retrieves relevant documents from a vector database (VectorDB) based on the query and then uses the LLM to generate an answer. VectorDBQA, on the other hand, directly retrieves documents from a vector database and uses a default question answering prompt to generate an answer. Both models can be used to answer questions based on retrieved documents, but RetrievalQA.from_llm provides more flexibility in terms of customizing the prompt and using the LLM for generating answers.

@liameiternes5744 9 ай бұрын

Thanks Greg, found your video from reddit. Extremely well explained, great work!🤗

@markvosloo8833 Жыл бұрын

Amazing video, thank you!

@DataIndependent Жыл бұрын

Thank you!

@michaelzumpano7318 Жыл бұрын

Excellent demo! Very straightforward programming.

@DataIndependent Жыл бұрын

Nice thank you Michael

@tubehelpr Жыл бұрын

Exactly what I was looking for - thank you!

@DataIndependent Жыл бұрын

Nice! Glad to hear it

@tubehelpr Жыл бұрын

@@DataIndependent Haivng a hell of a time getting the imports and dependancies working correctly...

@DataIndependent Жыл бұрын

@@tubehelpr same. It was a pain and didn’t record that part for y’all

@tubehelpr Жыл бұрын

@@DataIndependent Can you share some tips? Python version? I can't seem to get past the import nltk line since it throws an error but I have everything installed in my virtual env 🤔

@DataIndependent Жыл бұрын

@@tubehelpr what’s the error say?

@yili2419 Жыл бұрын

bad idea. open ai will keep your query and response (which includes your personal data) in their datacenter, and used it to train their next model.

@CelestialEnlight 8 ай бұрын

Taht is only reason stopping me from going all in on chatgpt , is there an alternative? Which can help me , i don't want to share all information with chatgpt at the same time don't have money to make my own locallly host gpt

@coachfrank2808 Жыл бұрын

I am highly interested in the implementation of a T-5 and different ways for textsplitting without losing context. A tour would be greatly appreciated..

@DataIndependent Жыл бұрын

Nice! Sounds good and I'll add this to the list.

@JT-Works Жыл бұрын

Great video series, keep up the amazing work! Also, you kind of look like Bradley Cooper. I figured that was a decent compliment, so I thought I would toss it your way. Have a great one.

@alx4571 Жыл бұрын

Is it possible to use Chat GPT-4 with a private source data? Something proprietary for example that wont get shared out to the model?

@DataIndependent Жыл бұрын

For that you'll want to use a private local model on your own computer. It would work there.

@dawidanio7053 Жыл бұрын

That was great, thanks a lot for your knowledge and for your work. Great job!

@DataIndependent Жыл бұрын

Nice, glad it was helpful

@Crowward92 Жыл бұрын

man this guy is a GOAT

@nattapongthanngam7216 5 ай бұрын

Thanks Greg! Great video on using custom files. Could you share a video about RAGs? I heard there are many types and I'd love to learn which is best for different tasks.

@jgill0 Жыл бұрын

Excellent walk through, thanks! If you have any interest in the elasticSearch integration I'd love to see a video on that :)

@DataIndependent Жыл бұрын

Nice! Thanks for the comment. I need to explore other document loaders and custom agents before I try out more vectorstores or dbs. I jotted this down on my ideas list.

@黄毅宁-o2w Жыл бұрын

that's a great tutorial of using langchain and GPT for building a QA robot. But i have a question. If you made specific questions, it could give you relevant answers. But in regular conversations, people can't always give full details. For example, the 1st question is "what did the president say about stephen curry?", the robot can understand and give answer. but the 2nd question is "did he reply?", most of details are lost in the 2nd one, if this question is sent directly to gpt, probably it can't give you correct answer. how do you think about that?

@yantaosong Жыл бұрын

langchain have buffer and memory storage can fix this .

@黄毅宁-o2w Жыл бұрын

@@yantaosong thx for the reply, I figured out that I should generate the standalone question according to the context before I make the query. Thank you anyway

@yantaosong Жыл бұрын

kzbin.info/www/bejne/aKnbq5x_jNKUiaM 7:28

@DeepakSwami-p3r Жыл бұрын

sir can we implement same module in nodejs frame work? i tried but some of required modules are not available in nodejs

@JasonMelanconEsq Жыл бұрын

I understand how you are using CharacterTextSplitter to chunk the data into 1000 characters. What code could be used to chunk by each page, and also if you wanted to chunk every time a particular word arises, such as "Question:"? I'm asking because I would like to create embedding for each question and answer in a deposition. Many thanks for the great videos! This helps even a non-coder understand the overall process. Excellent job!

@DataIndependent Жыл бұрын

Nice! That’s fun. The splitter takes an argument “separator” which you can customize and it’ll split your documents by that thing. In your case you can have it look for “question:” and split there. It’s not ideal and you may need to do some string clean up but it’ll work

@esltdo1 11 ай бұрын

thats what i do, sometimes gotta pull out the ol regex especially when chunking code, thankfully gpt makes that not an afternoon of regex googling anymore lol@@DataIndependent

@coachfrank2808 Жыл бұрын

Hi! We are building a research tool for german subsidy guidelines, which are very eloborate and hard to understand for the SME target group. Your video is really helpful. We have an excel sheet with 1800 state subsidies waiting to be simplified for over than 1.3 million SMEs making up over 60% of Germanys GNP. If you could tap into the processing of excel sheets and information extraction with source documentation me and my team would profit immensely. Keep up the step by step explainations. You are doing a phenomenal job teaching.

@DataIndependent Жыл бұрын

I’ll do a tutorial on that. Shouldn’t be too difficult. At a minimum, if you wanted to shoe horn this videos method. You could create a for loop and run through your excel doc, turn each cell you care about into a document, then load them up using DocumentLoader. But my guess is LangChain has support for this already.

@coachfrank2808 Жыл бұрын

@@DataIndependent brilliant! Can‘t wait for the vid. Do you have LinkedIn?

@DataIndependent Жыл бұрын

@@coachfrank2808 Sure do, but I communicate more on Twitter twitter.com/GregKamradt twitter.com/dataindependent www.linkedin.com/in/gregkamradt/

@coachfrank2808 Жыл бұрын

@@DataIndependent thank you for sharing! The use of openai involves lots of costs. recent developments show, that autoregressive LLMs (PaLM 560B) provide promising accuracy. isn't this the perfect case for langchain applications? (kzbin.info/www/bejne/joeUg4uCha6Jotk)

@DataIndependent Жыл бұрын

@@coachfrank2808 totally! They make it easy to swap out your models with whatever you want. Agreed that OpenAI is expensive. I’m glad there will be multiple options to drive prices down.

@robertovillegas2220 3 ай бұрын

I have a use case with a haven't seen anywhere: I create a private GPT that has documents as contexts. This documents has criteria for a specific subject I give it system instructions so its function is to evaluate a user document that it's attached as a part of the prompt, to see if the document complies with the criteria in the context documents, and give detail response on the result of evaluation and the justification that includes, the content of the user document and the criteria in the context documents. I want to do that in LangChain but I don't know hot to add a user document as a part of the prompt for the RAG.. It would be great if you can you explain how to approach this implementation. Thanks you for the content!! Keep the good work.

@abhijitbarman Жыл бұрын

I am really enjoying your content on Langchain. Its awesome. I was hoping if you could create videos around Vicuna LLM...What it is, How to fine-tune or train Vicuna LLM on custom dataset.

@RedCloudServices Жыл бұрын

can you create an example video to show an agent which does computations on a customer table or sheet?

@DataIndependent Жыл бұрын

Nice! What's a tactical example of what you mean? I always like real-world use cases.

@RedCloudServices Жыл бұрын

@@DataIndependent let’s say you had a sheet or dataframe or web table of bus trip categories, cities prices and departure times. is it possible to ask a langchain app for averages, totals, predictions on any feature? To combine both LLM and basic computational results? or even show you a bar chart 📊 by executing a python chart function using text input parameters?

@shanx1243 Жыл бұрын

Thanks for making these videos! I'd like to see FAISS in action with this kind of stuff.

@DataIndependent Жыл бұрын

Awesome thanks for sharing. I'll see if I can slot in a video for it

@cosmotxt680 Жыл бұрын

Thanks, your video was very helpful

@DataIndependent Жыл бұрын

Awesome - thanks for the words

@blocksystems202 Жыл бұрын

Dude this amazing - thanks so much. Can you also do a tutorial, building an application that interfaces with this? say uploading a doc into an app etc.

@DataIndependent Жыл бұрын

Nice that sounds fun. Check out my video on building apps which may help kzbin.info/www/bejne/i5DIh2utm7Kejrc I believe Streamlit is easy to work with on files.

@Red-fu3gb Жыл бұрын

Really helpful for me, thank you! I also heard about GPT Index , but don't know the difference between LangChain with GPT Index. Is it possible to see more details about the comparison?

@DataIndependent Жыл бұрын

Totally - Let's do a video on this. Thanks for the tip

@wilfredomartel7781 Жыл бұрын

What about using semantic search to retrieve relevant docs and flan-t5 to reasong over them?

@DataIndependent Жыл бұрын

Nice! I'll do a video on that next

@boratsagdiev6486 Жыл бұрын

thanks so much!

@AlmazLab Жыл бұрын

very informative, tnx!

@DataIndependent Жыл бұрын

glad it worked out - thank you

@RaoVenu Жыл бұрын

If I upload my custom documents to OpenAI, will it be private? I want to ensure that it is not available to the public at large and is only accessible through my API key. Can you clarify? Secondly, how long will this data set persist in OpenAI? If I upload my documents to OpenAI can I query it a month or two later? How about a year? Thanks

@DataIndependent Жыл бұрын

/ If I upload my custom documents to OpenAI, will it be private? Assume they're not. You can check their website for policies but either way you're giving them your data. You don't ever "upload" your documents to OpenAI. You give them a prompt and they return it. openai.com/policies/api-data-usage-policies

@tejagunupudi5318 Жыл бұрын

where are these chunks saved in chroma db? if so how can I check these db vectors?

@ishannagk2544 Жыл бұрын

Query response - is it summarizing from the docs or picking up an exact phrase ?

@DataIndependent Жыл бұрын

A bit of both? It's passing the docs to the LLM then coming up with answer for you. It's a bit of summarization but also word for word. Whatever the LLM chooses.

@dipenmandalia Жыл бұрын

How private will it be for our private notes, Excel or word file?

@DataIndependent Жыл бұрын

Anything you send to OpenAI should be assumed not private. Check their privacy policy for more information.

@lipin007 Жыл бұрын

how private would the file be after using it with ChatGPT?

@DataIndependent Жыл бұрын

Great question - here is where they talk about data privacy: help.openai.com/en/articles/6783457-chatgpt-general-faq

@drewwellington2496 Жыл бұрын

A very useful bit of information - and I'm not sure if this is possible with LangChain - would be to display how many tokens each request is using etc. This video is awesome but, behind the scenes, we have no idea how many tokens/embeddings/queries are being performed so I can't see any way to keep track of the cost involved in doing this over and over

@DataIndependent Жыл бұрын

Here you go! langchain.readthedocs.io/en/latest/modules/llms/examples/token_usage_tracking.html?highlight=token

@fullcrum2089 Жыл бұрын

this is why i wan't make my own module in js.

@fullcrum2089 Жыл бұрын

@@DataIndependent Thanks, this was helpful

@ValerioMaggio-u4r Жыл бұрын

I appreciate the enthusiasm in your video - however, I wonder what would happen when LLM models would start answering questions about somebody's else private document based on **your** private documents. Iow, privacy is always quite overlooked when it comes to these systems, and there are indeed examples of exploitation of private data from LLMs. I'd say that we should start advocating more and more about this, and raising awareness about those privacy concern is vital to start working on it.

@samatech8853 Жыл бұрын

How is the cost like when making request to the gpt model. Is it expensive or affordable? If it expensive how can it be reduced?

@DataIndependent Жыл бұрын

Here is pricing for openai openai.com/pricing The new gpt3.5 model is 10x less than the davinci one I've been using

@user-vc2sc9rq7t Жыл бұрын

Thanks for the great tutorial! For multiple documents, can you please advise on how i can retrieve the file name where the contextual information is retrieved from?

@DataIndependent Жыл бұрын

LangChain has the functionality to give you an answer w/ sources which should help. Check out their documentation.

@markp2381 Жыл бұрын

It would be cool to see how to load code documentation directly from the web.

@DataIndependent Жыл бұрын

Thanks for the note - that would be sweet and I'll run a video on it if a plugin doesn't do that out of the box shortly

@elsenorguerric Жыл бұрын

Great video thanks a lot :)

@megajagatube Жыл бұрын

Great video! What if the data in these files cannot leave the premises? Does calling the embedding take the data off premise?

@DataIndependent Жыл бұрын

Yes, because you need to send your raw text to OpenAI to get the embeddings back. If you wanted you could use a locally hosting embedding engine

@megajagatube Жыл бұрын

@@DataIndependent thanks! Can you point me to some resources on locally hosted embedding engine?

@TRSTNGLRD Жыл бұрын

This is awesome - how am I able to use this with a Davinci Model for more in-depth responses? Can you do a second video on Fine-Tuning this system to reduce Hallucinations, create more complex responses, and ask it more in-depth questions?

@DataIndependent Жыл бұрын

Ya sounds great - What is your use case that you want to run through? It's always better with a real world example

@TRSTNGLRD Жыл бұрын

@@DataIndependent KZbin Transcripts; I have a course that’s divided into 12 Months, all adding up to a total of 118 .txt files for Transcripts. I’d like to be able to create a “tutor” if you will, one where I can ask questions about the contents of the course if something confuses me I’ve made one that does this, but the absolute main issue I’ve found is structuring the Transcript data… The bot cannot interpret raw Transcripts all too well, so I realize I may need to reformat them into something like a Knowledge Graph for each Lecture. What would be the best way to structure/format a Transcript for this use case? The issue is that minimal data should be lost when re-formatting so the bot isn’t lacking any information that’s already been discussed. This has been my biggest issue

@proudindian3697 Жыл бұрын

Thank you so much..!

@DataIndependent Жыл бұрын

You're welcome!

@LedZeppelinThe Жыл бұрын

How can I integrate this to Bubble? Only part I am struggling with is being able to deploy a script like this and have it interact with Bubble via APIs

@DataIndependent Жыл бұрын

I'm not sure, but if you wanted to see how to do it on another platform you could check out my "how to build a webapp" video

@AI.Gadgets Жыл бұрын

It also answers questions which is not present in data fed

@DataIndependent 11 ай бұрын

You can put "don't respond if you don't see the context in the prompt" which may help

@vidhandhagai672 Жыл бұрын

Great video! Is there a way to create a chatbot that smartly uses our data + gpt-3.5 data and give us a COMBINED answer from both the data set instead of just our data set or gpt data set? So let's say your document had details about 'what did McCarthy discover?' but there's no information for 'when did McCarthy discover the language lisp?'. In this case, it should still be able to answer by looking up our data set for details related to McCarthy and language lisp... and then look-up the gpt-3.5 data for details related to when was it discovered as that's not in our data set.

@HantuMedias Жыл бұрын

I'm got IndexError: list index out of range when executing this line: docsearch = Chroma.from_documents(texts, embeddings). I tried loading a large pdf file. This could be the culprit. Can you suggest a workaround?

@DataIndependent Жыл бұрын

Hm, I haven't run into that one yet. Have you done it on a small pdf? Is your texts a list or a single text file? I believe it should be a list

@WeryZebra Жыл бұрын

I am not able to install chromadb, can anyone help me?

@DataIndependent Жыл бұрын

Take the error you're seeing and copy it into ChatGPT and see what it says

@WeryZebra Жыл бұрын

@@DataIndependent already tried every solution from GPT, I am stuck on a error which is saying you need to install Microsoft tools, so I installed but it's still the same

@planetcrypton9666 Жыл бұрын

I’m facing the same issue, when I run the command “pip installl chromadb” I encounter an error when trying to build the “sentencepiece” package and it states the problem is not with pip but with a sub process. Many others on coding forums experiencing this issue but with no solution readily available as of yet. Would appreciate any help and love your videos mate 👍

@WeryZebra Жыл бұрын

@@planetcrypton9666 I solved it, tell me your error, I might be able to help you

@planetcrypton9666 Жыл бұрын

@@WeryZebra the build of sentencepiece failed so it was unable to be installed resulting in Chromadb not being installed

@Murcie4S Жыл бұрын

Thank you for implementing the feature with the text file. While using the line of code 'qa = VectorDBQA.from_chain_type(llm=OpenAI(), chain_type="stuff", vectorstore=docsearch)', I received feedback indicating that the VectorDBQA module is deprecated. Deprecated cautions suggests importing 'from langchain.chains import RetrievalQA', which includes a retriever parameter. However, I encountered an issue when attempting to replace 'docsearch' with this parameter. Can you please advise me on how to properly use the 'RetrievalQA' module?

@DataIndependent Жыл бұрын

Check out my latest video on the 7 core concepts of Langchain and specifically the indexes section. I have an example about retriever in there. If that doesn't help then I recommend looking at the documentation

@verdurakh Жыл бұрын

I had the same error. I swapped the code with VectorDBQA to the following RetrievalQA.from_chain_type(llm=OpenAI(temperature=0.5), chain_type="stuff", retriever=docsearch.as_retriever()) The thing that took me time to find was the docserach.as_retiever function

@ubaidghante8604 Жыл бұрын

Great video man 💙💙🔥

@DataIndependent Жыл бұрын

Thanks 🔥

@PabloBossledaLuz Жыл бұрын

Thanks for such a great and informative series! Please keep bringing us more content about Langchain. Do you think it's possible to have a chat about the content you input (instead of simply asking a question about it)?

@DataIndependent Жыл бұрын

Could you describe more about what you mean? To create a chat bot for the data you input?

@PabloBossledaLuz Жыл бұрын

@@DataIndependent Yes, like a chat bot. For example, suppose you input a piece of content about your company or even a book, something GTP doesn't know about. What I'd like to do is not only to ask questions about it but also have GPT asking me questions and assesing my answers, in a chat-like way. The conversation could be pre-structured, like: "Let's have a 10min chat about the content above. I'd like you to ask me a question about it, then assess my answer, providing me the right answer in case my answer is not corrrect. Right after that, please repeat the process, asking me another question and correcting me, till the end of the 10min chat timeframe".

@avidrucker Жыл бұрын

This sounds amazing. Can ChatGPT already do this?

@avidrucker Жыл бұрын

*for content/topics it is already familiar with

@bakistas20 Жыл бұрын

Can you show how to do the same with GPT4All 🙏 I don't see support for embeddings for the model they use.

@DataIndependent Жыл бұрын

Ada-002 does the embeddings, not GPT4. But yes, I'll do a video on that later

@nsitkarana Жыл бұрын

for me, i had to go with RetrievalQA instead of VectorDBQA (it was marked as deprecated) and accordingly, changed the query to 'qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type="stuff", retriever=db.as_retriever(search_kwargs={"k": 1}))'. Other than this, worked well !!

@DataIndependent Жыл бұрын

I need to update the code, thank you.

@ivangouvea4195 Жыл бұрын

Amazing video! Would be great to see a version with an open-source model such as Alpaca/LLaMa. Does anyone know if it is available/possible?

@reidgajewski6755 Жыл бұрын

Help! When installing ChromaDB with pip install chromadb i keep getting this error "clang: error: the clang compiler does not support '-march=native' error: command '/usr/bin/clang' failed with exit code 1 [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for hnswlib Failed to build hnswlib ERROR: Could not build wheels for hnswlib, which is required to install pyproject.toml-based projects" I've been trouble shooting this all day and cannot get it resolved. MacOS 13.4 Intel i7

@DataIndependent Жыл бұрын

If Chroma isn't working for you try FAISS or check out their other integrations with vectorstores langchain.com/integrations.html

@aischool0912 Жыл бұрын

@Greg Kamradt can we pass prompts to Chatmodel while using conversational retrieval chain?

@ScriptureFirst Жыл бұрын

DEEP THEORY: a recent paper favored mass of diverse text, rather multiple iterative training epochs, yet new models' papers in Mar 2023 have used multiple denoising splits, maybe 3 or more, & sometimes dropping as little as 15%. QUESTION IS: to really make the LLM the master of the training text, do you see any advantages of an ensemble of splitting methods? Or do you have particular reason to find that a simple 1k split is a good enough 1-shot performer? [lastly: God bless you for your patience in answering pyton101 in the comments section, you truly are a saintly life-coach!]

@DataIndependent Жыл бұрын

The best performance I've seen has come from models which in-context learning is taken very seriously and the information retrieval is rigorously done.

@DanielGomez-zk2st Жыл бұрын

hi great video! Im wondering if anybody got an error when creating the docsearch "IndexError: list index out of range". Cant seem to find why this happens if I followed step by step. Any help is greatly appreciated. Thanks!

@ranu9376 Жыл бұрын

Cool! Is there any chance we could use other LLMs? instead of openAI? or was this designed specifically based on openAI models?

@DataIndependent Жыл бұрын

You could use other LLMs no problem. That's one of the cool parts about LangChain, swapping LLMs is easy

@kirtg1 Жыл бұрын

Thanks for the video. Can this be done for a 3000 text document?

@fusseldieb Жыл бұрын

Yes

@ramishashahid8853 11 ай бұрын

It's a really great video i want to ask that how to increase speeed bcz it takes alot of time to reply to user query

@rajkiransingh7826 Жыл бұрын

will this work for PDF files?

@ronakshah9561 10 ай бұрын

What a great video! Can this be done on web-urls - I have a use case wherein we have several internal confluence pages. And want to ask questions on them.Can they be somehow loaded into vectorDB. Any guidance highly appreciated.

@captainjackrana Жыл бұрын

This still seems to be uploading "private" data over to openAI during the embedding creation phase. Is there a way to create the embeddings without having to pass the document data to their APIs?

@chronicfantastic Жыл бұрын

I wish there was a better way to visualise the [source_documents] results - if you ask it an unrelated question it gives the right answer but hallucinates the reference points. Still a bit unsure what's going on. Thanks so much for these videos!

@DataIndependent Жыл бұрын

Check out langchain documentation, they have QA with sources and that should help

@kennethleung4487 Жыл бұрын

Great video! Any concerns about privacy here? You mentioned about using local files, but it seems like there is a chance for OpenAI to have access to the doc text?

@DataIndependent Жыл бұрын

OpenAI (or any LLM you use) will only have access to the pieces you send over to them. In this example we load up 5-10 essays. But to answer a question we only send 4 smaller chucks over to OpenAI. So if you're worried about OpenAI seeing *any* of your data, then yes there are privacy concerns. They no doubt are using your data to train more models, beyond that I'm not sure what they are doing with it.

@trunghieumai3895 Жыл бұрын

@@DataIndependent thanks for the answer. I also have the same concern :) I am curious to know is it possible to run a local GPT model to perform the same task using langchain. It would be great that you can share some of your thought :) Thank you very much!

@maof77 Жыл бұрын

Great feedback! How do you add extra documents to the Chroma store when using the persisted_directory? Neither 'add_documents()' nor 'from_documents()' seem to work for me :-(

@DataIndependent Жыл бұрын

What's the error you're getting? Here is the documentation (at the bottom) that could help: langchain.readthedocs.io/en/latest/modules/indexes/vectorstore_examples/chroma.html

@hayekianman Жыл бұрын

great video. is there any need to set temperature to anything other than zero for such 'search' like applications? i can see enterprise search being a simple use case, but people would want 1 authoritative answer or a ranking. so in such a case, can it for example, fall back to amazon kendra -which ranks the results instead ?

@DataIndependent Жыл бұрын

I don't fully understand the question but I would experiment with other temperature values to see what works best for you.

@luiztauffer8513 Жыл бұрын

Thanks for the video, very informative and didactic! So, if I understood it correctly, what this pipeline is doing would be somehow this: 1. Get chunks of text and create one embedding representation for each chunk. 2. Perform a semantic (vector) search on the database containing those vectors/text 3. Append the top X results from the semantic search (which is still the raw text from your dataset) + append the question you’re asking. The final string with all this content appended will be sent to OpenAI’s model Is that correct? Is it possible to have finer control over some aspects of this pipeline? E.g. number of X results (or minimum vector distance) to include in the OpenAI’s query? Or choose which OpenAI model to use?

@DataIndependent Жыл бұрын

Yep, more or less that is correct. Yes, you can control the "N" which is the number of documents returned. When using pine cone you can get the distance metric back with the N documents. You could filter out the ones you don't want. You can switch the OpenAI model when you initialize your model in the first place. The default is Davinci but you can switch it up.

@hetthummar9582 Жыл бұрын

Awesome video!! Really enjoyed it. 😃 It would be cool if you can make a video on langchain vs gpt-index.

@DataIndependent Жыл бұрын

Great suggestion! Will do.

@markvosloo8833 Жыл бұрын

1) Does the size of your own data (the text files in this case), affect the OpenAI charges? If so, this could be something one should be very aware of...correct? 2) Do the text files end up at OpenAI somewhere? I'm just thinking of private data ending up somewhere unintentionally.

@DataIndependent Жыл бұрын

Awesome! 1) The thing that matters is how much data you pass to openai, not necessarily the size of your original document. 2) Data you send to OpenAI does stay with OpenAI for 30 days and then it's deleted

@Piroco11 Жыл бұрын

Great video, thanks for all the insights. Had a question: Does every question you ask to the QA, send the entire set of documents in the prompt to chatGPT? I.e. does each question costs as much tokens as the entire set of documents + the question?

@SunnyKumar-r4x2l Жыл бұрын

I want to create a app for support where It should respond from the document ( the document can be anything like for programming languages error support) but my agent should create those query as well in that document or database if its not available in the document. Can you tell me how to do those things in Langchain. Also can you create some more interesting videos on Langchain.

@cryptobeanbag7148 Жыл бұрын

this is what Cathy Woods was talking about.

@ujjwalgupta1318 Жыл бұрын

Thanks a lot, what exactly is different between vectorDBQA and retriever though, they are doing same thing?

@jayavardhanvejendla7311 Жыл бұрын

Here you are using the OpenAI API and training the model with our own data. Is there anyway that we can do that offline (without internet) ?

@DataIndependent Жыл бұрын

Yep totally, get yourself a local model and run it on your machine. It will be not as powerful and slow!

@nanti_dulu Жыл бұрын

Hi, thank you for the great content😆! This is something I can't ask ChatGPT for help so it's really helpful! By the way, does this code work with a longer document? I used a 150-page pdf document and it exceeded the token limit. It worked fine with a shorter pdf. Thank you!

@nai0om Жыл бұрын

very nice and clear explanation. but how to make chat only talk about the topic that can be found in the document?

@DataIndependent Жыл бұрын

You can play around with your prompt to try and get it not to answer anything outside of what you want. I've had success with "If you don't know the answer say, 'I don't know.' Don't make anything up"

@ngates83 Жыл бұрын

@DataIndependent Excellent video !! I can't get it work with Azure OpenAI service. How do i specify API base, API type, API version etc.

@DataIndependent Жыл бұрын

Good question - Have you followed the tutorial here? langchain.readthedocs.io/en/latest/modules/llms/integrations/azure_openai_example.html?highlight=azure What's the error you're getting?

@ngates83 Жыл бұрын

@@DataIndependentyes, i did. but it doesn’t tell how to pass azure openai bindings to fetch embeddings and chroma .. that’s where i’m stuck.

@ngates83 Жыл бұрын

@@DataIndependent error: InvalidRequestError: The API deployment for the resource does not exist.

@DataIndependent Жыл бұрын

@@ngates83 I'm unsure my friend. You can try the langchain discord support channel and they may be able to help there.

@michaelb1099 Жыл бұрын

but how do we turn this into an application so others could use?

@pshah222 Жыл бұрын

I was able to edit the code with Directory loader function to read and query most types of documents in a folder. Is there a way I can integrate Google Search within the same code and questioning Chain so the logic of the model is to make a decision to search local db and if no answer is found it will also look on the internet

@yantaosong Жыл бұрын

how about table in pdf , but most content are text ?

@ComicBookPage Жыл бұрын

can this be done with a local LLM instead of sending private data to OpenAI?

@youwang9156 Жыл бұрын

really appreciate your work, just have one question for the chunk, how can I split the text into chunks by sentence or comma or space instead of chunk size?

@DataIndependent Жыл бұрын

Check out this page langchain.readthedocs.io/en/latest/_modules/langchain/text_splitter.html#RecursiveCharacterTextSplitter Where is says "self._separators = separators or [" ", " ", " ", ""]" Those are the default separators but you can specify your own. Do separators="," for a comma for example

@youwang9156 Жыл бұрын

@@DataIndependent thank you so much ! have a good day

@LAVolAndy Жыл бұрын

Thanks for this demo. I’ve built something similar using a set of unpublished manuscripts from a book publisher I work with. LangChain and OpenAI do a great job of answering questions contained within the texts, but it suddenly ONLY knows what is in my custom embeds. if I ask to compare a manuscript to one of Steven King’s best sellers it knows nothing. Because the only source is the embeds I’ve loaded into Pinecone, for example. How do you get the private data ADDED to the normal corpus and the answer reflect knowledge from both? I’d love to see a demo of that scenario.

@lightyagami6823 Жыл бұрын

Did you find a answer to this?

@angelfeliciano8794 Жыл бұрын

Amazing video. Any idea if this method work with information stored on my local Mysql database?

@Kalease54 Жыл бұрын

I just binged all of your videos on Langchain, this is exactly the library I was looking for. One question I have is if you need to utilize an OpenAI embeddings model for vector search of custom data, how would you also utilize a Model like let’s say Davinci if the solution also calls providing results not just from the vectorized content? For instance if the solution calls for having knowledge of personal data but also need to utilize LangChain search tools for query answer search? I don’t believe the OpenAI embeddings model can also do what you presented in your previous videos but I could be wrong. Any help would be greatly appreciated. Please keep up the videos!!

@DataIndependent Жыл бұрын

Nice! Thank you very much. For your question * Quick clarification - you don't *need* to use OpenAI for embeddings, lots of models can give you this. * The embeddings are just a way to get relevant documents. Once you got those docs you can do all sorts of chains (like the query answer search).

@Kalease54 Жыл бұрын

@@DataIndependent Thank you for the info. Do you have anything on your list for querying a SQL db for answers?

@DataIndependent Жыл бұрын

@@Kalease54 I haven't done that yet but good idea. I'll add that to the list.

@cgtinc4868 Жыл бұрын

@@DataIndependent I have one more question further to Kalease (which is a great Q btw). So after vectorized and uploaded to Pinecone for example, and let say where the original text (pdf, word, text etc) are resided; once they are disconnected, will the LLM still be able to retrieve the information (sorry if people have already asked this)

@rkthebrowneyedboy1 Жыл бұрын

Great video and thanks so much for simplifying the complex parts. Btw is there a way to create multiple indexes or collections in chromadb and use that index to limit the search to set of documents? i havent seen anywhere its defineable in your code. would be great if you could clarify. My best,

@DataIndependent Жыл бұрын

You could create multiple indexes, but I'm a fan of adding metadata to your embeddings that you can filter on. That way you can keep your data tidy. If the data is on completely different projects or topics then it may make sense for separate indexes. Check out the documentation on how to do this

@laubonbon Жыл бұрын

very helpful for me, i can follow it with no problem. thanks!!! i am wondering if there are document loader available for excel files?

@DataIndependent Жыл бұрын

Check out the langchain documentation for their currently supported loaders!

@ghalwash Жыл бұрын

Amazing tutoring 😊, if I may ask How can I train or on my e-commerce data and be able to get response in form of list of product IDs

@DataIndependent Жыл бұрын

Check out my video on getting structured data back from your LLM, that may help.

@KiritiSai93 Жыл бұрын

Amazing video! Thank you so much for putting this out. How does the text splitting affect the accuracy of returned results? I have a collection of question and answers for an educational course. I want to customize the prompt given to ChatGPT that these are question and answers and find the correct answer. Is this something that can be done with LangChain?

@DataIndependent Жыл бұрын

Yep big time. If you have a small set of question/answers then you can do the method in this video. If you have a ton of questions/answer, check out my video on asking a book a question

@KiritiSai93 Жыл бұрын

@@DataIndependent Thanks for the reply. Agree that recursive splitter is more useful for a single large document. My question was more like - is it possible to tell CHATGPT via a prompt that you are looking at question and answer document instead of it assuming they are just pieces of text?