Llama-2 with LocalGPT: Chat with YOUR Documents

Рет қаралды 169,558

Күн бұрын

Пікірлер: 229

@engineerprompt Жыл бұрын

Want to connect? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering

@electricsheep2305 Жыл бұрын

This is the most important KZbin video I have ever watched. Thank you and all the contributors. Looking forward to the prompt template video!

@RabyRoah Жыл бұрын

Awesome video! Would next love to see a video on how to install LLaMA2 + LocalGPT with GUI in the cloud (Azure). Thank you.

@REALTIES Жыл бұрын

Waiting for this as well. Hope we get this soon. 🙂

@synaestesia-bg3ew Жыл бұрын

Love you, you the best KZbinr Ai expert. I am learning so much because you don't just talk, you are making things work.

@engineerprompt Жыл бұрын

Thanks for the kind words.

@synaestesia-bg3ew Жыл бұрын

@@engineerprompt No it's not just kind words, i really admire and recognize when someone have a true passion and put the effort. Further more I really love your emphasis on localisation of Ai projects, I think it's the future,no one wants to be a blind slave of a central corporation's Ai, censoring and monitoring every words . Even Elon said it, Ai should be democratized. The only problem, we might not achieved it because all the odds and money are against it, but I believe it worth trying.

@swanidhi Жыл бұрын

Great content! I look forward to your videos. Pleae also create a video to guide people who are new to the world of deep learning so that they know what to learn from where and things they need to learn to start contributing to projects such as localGPT. Also, another video on how to determine which quantized model can be run efficiently on a local system by specifying parameters for this assessment. Thanks again!

@richardjfinn13 Жыл бұрын

Worth pointing out - you got it when you asked to show sources, but the presidentail term limit is NOT in Article 2. It's in the 22nd Ammendment, passed in 1951 after FDR was elected to a 3rd term.

@InnovationDevInsider Жыл бұрын

It would be great if these massive beast models could run on low-end machines so that we can increase contributions

@brianhauk8136 Жыл бұрын

I look forward to seeing your different prompt templates based on the chosen model. 🙂

@simonmoyajimenez2045 Жыл бұрын

I found this video really compelling. I believe it would be incredibly fascinating to leverage a CSV connection to answer data-specific questions. It reminds me of the article I read titled 'Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit.

@isajoha9962 Жыл бұрын

Thanks for the overview explaining the differences between the models in the context. 👍

@awu878 Жыл бұрын

really like your video!! Looking forward for the video about the local GPT api😃

@mauryaanurag6622 Жыл бұрын

On which system did u did this?

@MikeEbrahimi Жыл бұрын

A video with API access would be great

@ZamirahmadSiddiqui Жыл бұрын

Ultimate, one of the best video in youtube.

@mauryaanurag6622 Жыл бұрын

On which system did u did this?

@ssvfx. Жыл бұрын

after 10 hours straight... WE GOT ENTER A QUERY: LETSGOOOOOOOO

@Socrataclysm Жыл бұрын

This looks wonderful. Going to get this setup tonight. Anyone familiar with differences between privateGPT and LocalGPT? Seems like they give you mostly the same functionality.

@hish.b Жыл бұрын

I don’t think privategpt has gpu support tbh.

@SamirDamle Жыл бұрын

Is there a way we can have a Docker image of this that I can run as a container with everything configured by default?

@StarfilmerOne 9 ай бұрын

I thought same, at least config presets for models or smtng I don't understand

@FocusOnYourselfBetter Жыл бұрын

/🎯 Key Takeaways for quick navigation: 00:00 🤖 演示了如何在10分钟内集成Chat GPT到公司数据 01:23 💼 Flowwise是一个视觉UI构建工具，可快速构建大型语言模型应用 03:25 📚 使用Flowwise需要OpenAI API密钥和Pinecone API密钥 04:43 ⚙️ 使用Docker启动Flowwise应用，可设置端口以避免冲突 05:57 🚀 构建对公司数据回答问题的会话型AI 10:16 📁 Flowwise支持多种文件格式，可加载CSV、Docx、PDF等 11:48 💬 创建对话式AI，连接OpenAI、Serp API以及内存功能 13:23 🐍 可以使用Python脚本与Flowwise API进行交互和部署 15:10 ⚡ Flowwise用于快速原型验证，不建议用于完整应用程序开发 Made with HARPA AI

@Isthismylifenow Жыл бұрын

Great work! Cheers

@adriantang5811 Жыл бұрын

Thank you for your sharing!

@ypsehlig Жыл бұрын

Same question as shogun-c, looking for hardware spec recommendations

@nikow1060 Жыл бұрын

BTW if someone has issues setting up CUDA and conflict with bits and bytes: after checking CUDA in visual studio and checking that Cuda version matches torch requirements you can try the following for windows installation: pip install bitsandbytes-windows . Someone has provided a corrected bitsandbytes version for windows ... it worked for me after 24 hours of bitesandbytes errors compalining about Cuda installation

@teleprint-me Жыл бұрын

Pro Tip: Don't set max_tokens to the maximum value. It won't work out the way you hope. It's better to use a fractional value that represents a percentage of the maximum sequence length.

@Ericzon Жыл бұрын

please, could you elaborate more this answer?

@bobo32756 Жыл бұрын

@@Ericzon yes, this would be interesting !

@antdok9573 Жыл бұрын

Has NO idea why

@teleprint-me Жыл бұрын

@Ericzon The sequence length represents the context window for a model. The max token parameter dictates the maximum sequence length the model can generate as output. When you input a text sequence, it becomes a part of the context window. The model will generate a continuation sequence based on the input. What do you think will happen as a result if the model is allowed to generate an output sequence that is as long as its context window while incorporating your input? For those that don't know, it generates an error in the best-case scenarios. The worst-case scenario is a bug of your own making that leaves you with absolute frustration.

@teleprint-me Жыл бұрын

@antdok9573 I was in the ER and then working on PyGPTPrompt while recovering at home. So 🤷🏽‍♂️. Also, if you think the Reactance Response is a clever mental hack, I would like to disagree with you. I find statements like this as rude as they are pretentious.

@littlesomethingforyou 11 ай бұрын

hi i an unable to get back a response after i "enter query". vs code just gets stuck. is it because im running it on my cpu?

@giovanith Жыл бұрын

hello, this run in Windows 10 ? trying a lot here but no success (W10, 40 Gb Ram, RTX4070 12 Gb VRam). Thanks

@Shogun-C Жыл бұрын

What would the optimum hardware set up be for all of this?

@Livanback Жыл бұрын

With petal you can run from any hardware even mobile

@HedgeHawking Жыл бұрын

@@Livanback Tell me more about pertal please

@FaridsVids Жыл бұрын

I'm running llama2 on LM studio which doesn't require all these convoluted steps on my Ryzen 9, 48GB of ram, and a 1080 card and it runs like mollases

@deepalisharma1327 Жыл бұрын

Very nicely explained, thank you!!

@mauryaanurag6622 Жыл бұрын

On which system did u did this?

@intuitivej9327 Жыл бұрын

Hi thankful for you, I am experimenting llama2 13b ggml in my notebook. But a few days ago, I was realized that llama was not saved in my local but it was like.. i don't know.. snap shot...?I want to save the model and load it from my local path. I tried some codes but failed.. could you please guide me? Thank you again for your sharing ❤

@jennilthiyam1261 Жыл бұрын

Hi. I have tried to chat with two CSV files. the thing is the model is not performing well. it is not even giving the correct answer when I ask about a particular value in a row given the key words. It is not good at all. I am using 70B. Does anyone have any idea how to make it more relatable? It does not even able to understand the data presented in CSV files.

@Elshaibi Жыл бұрын

Great video as usual, it would be great to have one click install file for those who are not expert

@engineerprompt Жыл бұрын

Let me see what I can put together

@absar66 Жыл бұрын

Many thanks..carry on the good work mate 👍

@engineerprompt Жыл бұрын

Thank you for kind words.

@caiyu538 Жыл бұрын

Thank you for providing such great AI tool for use.

@adivramasalvayer3537 11 ай бұрын

Is it possible to use elastic as a vector database? If yes, may I get the tutorial link? Thank You

@kamilnwa8020 Жыл бұрын

Awesome video. QQ: how to add multiple “cuda”? the original code specifies `device="cuda:0"`. How to modify this line to use 2 or more GPUs?

@BigAura Жыл бұрын

Great job! Thank you!!

@anushaaladakatti4145 Жыл бұрын

Can we extract and show images from this, as it responds with text content? how to do it

@DarylatDarylCrouse Жыл бұрын

The code shown in the video is not the same as what is at the repo right now. at 13:17 for example, I can't find any of that in the current repo. Pleaser help.

@engineerprompt Жыл бұрын

Yes, it's changing as I am trying to add more features and make it better. Model definition has been moved to constants.py. I created a new video on the localGPT API, the changes are outlined in that video: kzbin.info/www/bejne/gXPVhWOZr9end9U

@1989arrvind Жыл бұрын

Great👍

@SadeghShahmohammadi Жыл бұрын

Nice work. Very well done. When are you going to explain the API version?

@touristtam Жыл бұрын

How is it that this type of LLM generator project have usually no test what so ever and heavily lean on Conda?

@lshadowSFX Жыл бұрын

what if i already have all the files of the model downloaded somewhere? how do i make it use those files?

@brianrowe1152 Жыл бұрын

Mine says bitsandbytes is deprecated, but then when I try to do pip install says its already met, but when I run again it says deprecated.. please install.

@caiyu538 Жыл бұрын

I have asked this question in your other video "This localgpt works greatly for my file, I use T4 GPU with 16GB cuda memory, it will take 2-4 minutes to answer my questions for a file with 4-5 pages. Is it expected to take so long to answer the question using T4 GPU?". After watching this video, I think I can use quantized version model instead of LLAMA 2 vicuna 7B model

@DhruvJoshiDJ Жыл бұрын

one click installer for the project would be great.

@skrandas Жыл бұрын

@engineerprompt I would join in development, more on front end side, but at the moment seems this repo will mainly orient it self to be a doc reading tool, so idk. However, if this GPT could just be injesting files for knowledge, have access to internet to find out stuff, summarize. Sure id be up for it

@РыгорБородулин-ц1е Жыл бұрын

took someone this short to create a video about it)

@nattyzaddy6555 Жыл бұрын

Is the larger embeddings model much better than the smaller one?

@boscocorrea1895 Жыл бұрын

Waiting for the api video.. :)

@trobinsun9851 Жыл бұрын

Does it needs a powerful machine ? a lot of RAM ?

@olegpopov3180 Жыл бұрын

Where is the model weights update (finetune)? Or am i missing something in the video... So, you created embeddings and store them in the DB. How are they connected to the model without finetuing process?

@engineerprompt Жыл бұрын

I would recommend to watch the localgpt video, link is in the description. That will clarify alot of things to you. In this case, we are not doing any fine-tuning of the model search. Rather we do a semantic search on most relevant parts of the document and then give those parts along with the prompt to the LLM to generate an answer

@bowenchen4908 Жыл бұрын

Is it very slow if we run locally? thank you in advance

@tk-tt5bw Жыл бұрын

Nice video. But can we make some videos for a M1 silicon MacBook

@kittyketan Жыл бұрын

Great Work !! shot for the moon!

@engineerprompt Жыл бұрын

Thanks 😊

@md.ashrafulislamfahim3106 Жыл бұрын

Can you please tell me how can I use this project for implementing it using django for a chatbot?

@MrBorkori Жыл бұрын

Hi, thank you for your content it's very useful! Now I'm trying to run with TheBloke/Llama-2-70B-Chat-GGML, but gives me an error - error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model Any help will be appreciated

@asepmulyana9085 Жыл бұрын

Have you created localgpt api? For example to connect to whatsapp bot. I am still not have an idea how to do that. Thanks.

@BrijeshSingh-f2c Жыл бұрын

how can I run it with rocM and AMD GPUs? I'm a noob here and want to explore this project.

@vkarasik Жыл бұрын

Many thanks - I was able to install localGPT and ingest my docs. Two questions (vanilla install by cloning your repo as is): 1) on 16CPUs/64GB RAM x86 instance it takes 1-1.5 minutes for getting answer, 2) "what is the term limit of the us president?" I'm getting "The President of the United States has a term limit of four years as specified in Article II, Section 1 of the US Constitution." answer :-(

@jerkoviskov5379 Жыл бұрын

Do i need GPT4 API to run this on my machine?

@CharanjitSinghPhull Жыл бұрын

great video helped a lot but there is an issue that when passing parameters with --show_sources and also asking the llama2 a question outside the data which was ingested , it provides answer for that and states an incorrect source document which has nothing to do with the actual question, and why provide all data which were used cant we directly get line number and page number with the document name only.

@fabulatetra8650 Жыл бұрын

Is this using Fine Tuning on Llama-2 Model? thanks

@kunzelbunt Жыл бұрын

Did i need a specific structure in my .pdf Document, so that the language model could read the data cleanly?

@engineerprompt Жыл бұрын

This will work well on text, figures and tables is still an issue’s

@IbrahimAkar Жыл бұрын

Does it download/load the model every time you run the app? I think there should be a check if the model is already downloaded if this is the case

@engineerprompt Жыл бұрын

It downloads the model only once

@Pulsepark12 Жыл бұрын

Please i get error tbe repo liama2 not existing !!!!😢😢😢😢

@ToMtheVth Жыл бұрын

Works quite well! Unfortunatelly performance on my M1 MB Pro is a bit of an issue. Ingested 30 Dokuments, Prompt eval time is 20 Minutes... I need better Hardware XD

@Bamseficationify Жыл бұрын

Im running intel 7 cpu Best i could get was like 570000ms which is like almost 10 minutes for a reply. How do i get this number down without getting a gpu

@aketo8082 Жыл бұрын

Thank you for this video. One thing keeps me busy. How does the "intelligenz" work in GPT, LLM...? Because I can't see, that this model "understand" relationship, can identify locations and difference between three person with same first name. I know LLM is a database for the words. I use GTP4All, tested all available LLM's, but always the same problem. Also correction via chat are not possible, and so on. So I guess, I don't understand that "AI" right or miss some basic information. Also how to train that. Same problems with ChatGPT, Bing, Bard, etc. Thank you for any hint, links and suggestions.

@TheCopernicus1 Жыл бұрын

Amazing project mate, any recommendations on running the UI version? I have followed the documentation and changed to the GGML versions of Llama2 however keeps erroring out. Could you perhaps recommend any additional instructions? I am running on M1 many thanks!

@prestonmccauley43 Жыл бұрын

I had the same issue on PAC llama errored out....

@IbrahimAkar Жыл бұрын

Use the GGUF models. I had the same issue.

@abdalrhmanalkabani8784 Жыл бұрын

when I run the code do i install the model locally ?

@KJB-Man Жыл бұрын

It's a great app! Thank you! Suggestion, can you redo the requirements.txt to install torch with cuda support? It is irritating to have to uninstall torch and install it with conda and the cuda flag set. And then troubleshoot the issues that causes.

@nadavel85 Жыл бұрын

Thanks a lot for this! Did anyone encounter this error and resolved it? "certificate verify failed: unable to get local issuer certificate"

@ihydrocarbon Жыл бұрын

Seems to work on my Fedora 38 ThinkPad i5, but am confused as to how to train it on other data. I removed the constituion.pdf file and it still responds to queries with answer that refer to that document...

@engineerprompt Жыл бұрын

There is a DB folder, delete that and rerun the ingest.py

@hassentangier3891 Жыл бұрын

I downloaded the code ,but it's already changed how to integrate the llama downlaoded,please

@abeechr Жыл бұрын

Seems like setup and installation went well, but when I enter a query, nothing happens. No error, no nothing. Many attempts… Any ideas?

@bhavikpatel7612 Жыл бұрын

Getting error for Nonetype object is not subscriptable. Can you please help.

@AbhayJoshi-v8m Жыл бұрын

HI Thanks for sharing , great content and very useful, one question can we create a prompt like chatbot , like when we want to read research publications

@engineerprompt Жыл бұрын

Yes!

@kashishvarshney2225 Жыл бұрын

I run it on cuda but it's still giving ans in 3 mins how can I improve it's speed please someone reply

@chengqian5737 Жыл бұрын

hello, thanks for the video. However, what's the meaning of having chunk size as 1000, while the sentence embedding model can only take a maximum of 128 tokens at a time? I would suggest to reduce the chunk size to 128 if you prefer the sentence transformer.

@brookyu2 Жыл бұрын

on macbook pro with m1 pro, after inputting the prompt, nothing is returned. the program is running forever. anything to do with my pytorch installation?

@gamerwalkers Жыл бұрын

worked ok for me. but result was slow. took 3-5 minutes for me

@lucas_badico Жыл бұрын

for an Macbook m1 pro, what model you recommend?

@engineerprompt Жыл бұрын

Will depend on how much memory your machine has. A base version will run 7/13B quantized models.

@lucas_badico Жыл бұрын

@@engineerprompt mine has 16GB

@rpals5412 Жыл бұрын

Hello there. Awsesome little piece of software you have made! I'm on a laptop with a nvidia RTX 3050 GUP, but when i try to run any models that requires GPU i get the following error: GPU is required to quantize or run quantize model. Suggesting that there's no gpu - any idea how I can configure the software to utilize my hardware gpu ?

@engineerprompt Жыл бұрын

Thanks for kind words. Do you have torch and cuda drivers installed on your system?

@yb6036 10 ай бұрын

Perhaps it’s attempting to use the integrated graphics

@manu053299 Жыл бұрын

is there a way for locat gtp to read a pdf file and extract the data to a specified JSON structure

@engineerprompt Жыл бұрын

Yes, you will have to write a code for that.

@gold-junge91 Жыл бұрын

It would be grateful to add it to my paperless-ngx

@mibaatwork Жыл бұрын

You talk about cpu and Nvidia, what’s about AMD? Can you add it also?

@ElectroRestore Жыл бұрын

Thank you so much for this video! It has 2 of the three components I am in need of: privateGPT and Llama 2 7B (or 13B) Chat model. My third requirement is to have this running in on my server Windows 11 machine, such as Text-Gen Web UI so I can access it over my network with remote wifi browsers. Can you explain how to get this in the web ui?

@engineerprompt Жыл бұрын

Wait for my api video :)

@victoradegbite4819 Жыл бұрын

Hi, @@engineerprompt I need your email for discussion on a project.

@engineerprompt Жыл бұрын

@@victoradegbite4819 check out the video description :)

@Gingeey23 Жыл бұрын

Great video and impressive project. I'm having errors when ingesting .txt files, but .PDFs seem to work fine! Has anyone run a WireShark report or similar to monitor the packets being transmitted externally to ensure that this is 100% local and no data is being leaked? would be great to get that assurance! again, great work and thanks

@Psychopatz Жыл бұрын

i mean you could just turn off the data for it to verify that its purely local

@Psychopatz Жыл бұрын

i mean you could just turn off the data for it to verify that its purely local

@RealEnigmaEngine Жыл бұрын

Is there a support for AMD GPUs?

@intuitivej9327 Жыл бұрын

Hi, thankful for your sharing, i am experimenting with it. By the way, how can i train the model properly? Would it remember the doc and the conversations we had after restarting the computer? I want to fine-tune it so it can learn the context.

@mauryaanurag6622 Жыл бұрын

On which system did u did this?

@intuitivej9327 Жыл бұрын

@mauryaanurag6622 system..? Window11..? I am a beginner, so I am not sure if this is the right answer..

@extempore66 Жыл бұрын

Hello everyone, using a local (cached) llama2 model with LanceDB as an index store. Ingesting many PDF files and using a fairly standdard prompt template ("given the context below {context_str}. Based on the context and not prior knowledge, answer the query. {query_str} ..."). It is very frustrating getting different answers for the same question. Aany similar experiences? Thhanks

@engineerprompt Жыл бұрын

Have you looked at the source documents returned in each case? That will be a good starting point. Also potentially reduce the temperature of the LLM

@extempore66 Жыл бұрын

@@engineerprompt Thank you for the prompt response. I have looked at the documents. Also ran a Retriever evaluator (Faithfulness, relevancy, Correctness ...) Sometimes it passes sometimes it does not. Still trying to wrap my brain around why the answers are so different while the nodes (vectors) returned are obviously consistently the same

@ignacio3714 Жыл бұрын

hey friend. it seems that you know a lot!! could you give me a hand with this? I want to know what's the best way of creating your own "chatgpt" but giving a specific amount of X files. Let's say a small book, and then be able to ask questions about it, but running it locally or in the cloud (not using any third party basically). what's the best way of doing it? is it even possible? like running something like that in google colab with files from google drive or something like that? thanks in advance man!

@NandkishoreNangre Жыл бұрын

Which python version is used in this video?

@swapnilmahure4802 Жыл бұрын

where the model save locally ?

@rahuldayal8406 Жыл бұрын

Great video, I am using M2 pro it answers my query really slow, how can I make it quick?

@gamerwalkers Жыл бұрын

same problem using m1

@manu053299 Жыл бұрын

also is there local gtp that could convert the word doc to some other language

@engineerprompt Жыл бұрын

Depending on the model, some will support other languages.

@varunms850 Жыл бұрын

Can you give your system setting , RAM, VRAM, GPU and Processor ?

@VoltVandal Жыл бұрын

Thank you ! working really great, even on a M1. Just one question, i compiled/ran llama_cpp on my M1 and it is using GPU, can this be somehow work also for your project ? [I'm just a beginner, THX]

@engineerprompt Жыл бұрын

Yes, under the hood, it's running the models on llama_cpp.

@VoltVandal Жыл бұрын

Ha, got it, maybe no problem for pro's, but had just to: CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir And now it is using GPU !

@gamerwalkers Жыл бұрын

@@VoltVandal Hi Martin. Could you help with me on steps how you made it work? I am using m1 but model is running very slow. How fast does it give you a response. For me ranging between 3-5 minutes.

@VoltVandal Жыл бұрын

@@gamerwalkers Well, this is due to the endless swapping (16GB mbp with 7B model). I got it running directly with the main command from the llama-cpp. But if i start embedding etc. i'm ending with the same 5min swap. I'm not a python pro or ai pro, so can not give you an answer, why this happens. I returned to my old nvidia machine, unfortunately. So at least it is working, not getting GPU errors, but still no fun. 😞 Maybe some ai pro can tell more on that issue.

@gamerwalkers Жыл бұрын

@@VoltVandal when you say directly from the main command from llama-cpp are you able to run that command trained on our own pdf like how video has done? Or you are just referring to default llama prompts that answers queries but not trained on our own pdf data?

@RameshBaburbabu Жыл бұрын

Thanks Man !! , it was very useful. Here is one use case , for evolving user data embedding . in RDBMS DB we update a row and we can add more data in other tables with associated Foreign keys. can we embed say patient data day after day and ask questions for 1 year of particular patient ..? bottom line , I am looking for `incrementally embed` ....

@mmdls602 Жыл бұрын

Use llama index in that case. It has bunch of libraries that you can use to “refresh” the embedding when you add or delete the data. Super useful.

@Weltraumaff3 Жыл бұрын

Maybe a stupid question and I'm just missing what --device_type I should enter but: I'm struggling a bit using my AMD 6900XT. I don't want to use my CPU and PyTorch doesn't seem to work with OpenCL for example. Has anyone got an idea? Cheers in advance

@engineerprompt Жыл бұрын

in this case, cpu :)

@martindouglas3901 11 ай бұрын

Is this still a viable solution or has it been overcome by events. Don't worry, I am not judging, I too have been overcome by events (OBE)

@engineerprompt 11 ай бұрын

I think its still useful (my biased opinion :) ). I still building it as a framework to experiment with new RAG techniques.

@AbhishekKumar-v8q5b Жыл бұрын

what are the system /resource requirements ?

@engineerprompt Жыл бұрын

Check out the repo, there is a file called constants.py and had all the details regarding the system requirements

@pratik1762006s Жыл бұрын

Thank you for the video. Would you be able to share how to work with llama for sentiment analysis. I downloaded the model, but it somehow doesnt work from local or from the transformers version.

@mauryaanurag6622 Жыл бұрын

On which system did u did this?

@pratik1762006s Жыл бұрын

@@mauryaanurag6622 rtx 4090