Llama-2 with LocalGPT: Chat with YOUR Documents

  Рет қаралды 167,498

Prompt Engineering

Prompt Engineering

Күн бұрын

Пікірлер: 232
@engineerprompt
@engineerprompt Жыл бұрын
Want to connect? 💼Consulting: calendly.com/engineerprompt/consulting-call 🦾 Discord: discord.com/invite/t4eYQRUcXB ☕ Buy me a Coffee: ko-fi.com/promptengineering |🔴 Join Patreon: Patreon.com/PromptEngineering
@synaestesia-bg3ew
@synaestesia-bg3ew Жыл бұрын
Love you, you the best KZbinr Ai expert. I am learning so much because you don't just talk, you are making things work.
@engineerprompt
@engineerprompt Жыл бұрын
Thanks for the kind words.
@synaestesia-bg3ew
@synaestesia-bg3ew Жыл бұрын
@@engineerprompt No it's not just kind words, i really admire and recognize when someone have a true passion and put the effort. Further more I really love your emphasis on localisation of Ai projects, I think it's the future,no one wants to be a blind slave of a central corporation's Ai, censoring and monitoring every words . Even Elon said it, Ai should be democratized. The only problem, we might not achieved it because all the odds and money are against it, but I believe it worth trying.
@zohaibsiddiqui1420
@zohaibsiddiqui1420 Жыл бұрын
It would be great if these massive beast models could run on low-end machines so that we can increase contributions
@awu878
@awu878 Жыл бұрын
really like your video!! Looking forward for the video about the local GPT api😃
@mauryaanurag6622
@mauryaanurag6622 9 ай бұрын
On which system did u did this?
@RabyRoah
@RabyRoah Жыл бұрын
Awesome video! Would next love to see a video on how to install LLaMA2 + LocalGPT with GUI in the cloud (Azure). Thank you.
@REALTIES
@REALTIES Жыл бұрын
Waiting for this as well. Hope we get this soon. 🙂
@Socrataclysm
@Socrataclysm Жыл бұрын
This looks wonderful. Going to get this setup tonight. Anyone familiar with differences between privateGPT and LocalGPT? Seems like they give you mostly the same functionality.
@engineerprompt
@engineerprompt Жыл бұрын
Main difference is the ability to use GPUs. First version of privateGPT was using cpu. And this will download models for you!
@caiyu538
@caiyu538 Жыл бұрын
Thank you for providing such great AI tool for use.
@vkarasik
@vkarasik Жыл бұрын
Many thanks - I was able to install localGPT and ingest my docs. Two questions (vanilla install by cloning your repo as is): 1) on 16CPUs/64GB RAM x86 instance it takes 1-1.5 minutes for getting answer, 2) "what is the term limit of the us president?" I'm getting "The President of the United States has a term limit of four years as specified in Article II, Section 1 of the US Constitution." answer :-(
@b326yr
@b326yr Жыл бұрын
Welldone, but nah I'm just gonna wait for things to become simplier and have better interface. Waiting for Microsoft Copilot.
@Gingeey23
@Gingeey23 Жыл бұрын
Great video and impressive project. I'm having errors when ingesting .txt files, but .PDFs seem to work fine! Has anyone run a WireShark report or similar to monitor the packets being transmitted externally to ensure that this is 100% local and no data is being leaked? would be great to get that assurance! again, great work and thanks
@Psychopatz
@Psychopatz 10 ай бұрын
i mean you could just turn off the data for it to verify that its purely local
@Psychopatz
@Psychopatz 10 ай бұрын
i mean you could just turn off the data for it to verify that its purely local
@zkiyyeller3525
@zkiyyeller3525 Жыл бұрын
I'm grateful for these videos. However I'm always running into issues due to environment setup or a requirement deficiency. Any way someone can create a this in an environment we can use? Again very grateful but I'm now on my 7th hour of debugging and still don't have the environment ready.
@aketo8082
@aketo8082 Жыл бұрын
Thank you for this video. One thing keeps me busy. How does the "intelligenz" work in GPT, LLM...? Because I can't see, that this model "understand" relationship, can identify locations and difference between three person with same first name. I know LLM is a database for the words. I use GTP4All, tested all available LLM's, but always the same problem. Also correction via chat are not possible, and so on. So I guess, I don't understand that "AI" right or miss some basic information. Also how to train that. Same problems with ChatGPT, Bing, Bard, etc. Thank you for any hint, links and suggestions.
@Bamseficationify
@Bamseficationify Жыл бұрын
Im running intel 7 cpu Best i could get was like 570000ms which is like almost 10 minutes for a reply. How do i get this number down without getting a gpu
@asepmulyana9085
@asepmulyana9085 Жыл бұрын
Have you created localgpt api? For example to connect to whatsapp bot. I am still not have an idea how to do that. Thanks.
@intuitivej9327
@intuitivej9327 Жыл бұрын
Hi, thankful for your sharing, i am experimenting with it. By the way, how can i train the model properly? Would it remember the doc and the conversations we had after restarting the computer? I want to fine-tune it so it can learn the context.
@mauryaanurag6622
@mauryaanurag6622 9 ай бұрын
On which system did u did this?
@intuitivej9327
@intuitivej9327 9 ай бұрын
@mauryaanurag6622 system..? Window11..? I am a beginner, so I am not sure if this is the right answer..
@martindouglas3901
@martindouglas3901 7 ай бұрын
Is this still a viable solution or has it been overcome by events. Don't worry, I am not judging, I too have been overcome by events (OBE)
@engineerprompt
@engineerprompt 7 ай бұрын
I think its still useful (my biased opinion :) ). I still building it as a framework to experiment with new RAG techniques.
@lshadowSFX
@lshadowSFX Жыл бұрын
what if i already have all the files of the model downloaded somewhere? how do i make it use those files?
@IbrahimAkar
@IbrahimAkar Жыл бұрын
Does it download/load the model every time you run the app? I think there should be a check if the model is already downloaded if this is the case
@engineerprompt
@engineerprompt Жыл бұрын
It downloads the model only once
@MayurLaxmanrao
@MayurLaxmanrao Жыл бұрын
Can I run it on local machine having RAM of 8GB?
@engineerprompt
@engineerprompt Жыл бұрын
Yes, you will need to use a quantized model. You will also need to replace the embedding model
@MayurLaxmanrao
@MayurLaxmanrao Жыл бұрын
@@engineerpromptThank you so much for your kind reply. Your content is amazing. I don't have the GPU in my machine . can I use quantized model of Vicuna-7B?
@manu053299
@manu053299 Жыл бұрын
is there a way for locat gtp to read a pdf file and extract the data to a specified JSON structure
@engineerprompt
@engineerprompt Жыл бұрын
Yes, you will have to write a code for that.
@ernestspicer7728
@ernestspicer7728 Жыл бұрын
What is the best way to generate a Rest API for localGPT?
@engineerprompt
@engineerprompt Жыл бұрын
Check the latest video on the channel
@ernestspicer7728
@ernestspicer7728 Жыл бұрын
How much would you charge for a few hours work and walkthrough on a specific project using your code as a base?@@engineerprompt
@mauryaanurag6622
@mauryaanurag6622 9 ай бұрын
Does it works offline ?? without internet
@engineerprompt
@engineerprompt 9 ай бұрын
Yes
@fjpatil9115
@fjpatil9115 Жыл бұрын
Can anyone let me know, I am running localGPT on local machine with CPU. Its taking lot of time to return the response. Any tip to increase the response time. That really helps.
@engineerprompt
@engineerprompt Жыл бұрын
Unfortunately, you will need a relatively good gpu to run it at the moment. Hopefully this will change soon in the near future
@ManojChoudhury99
@ManojChoudhury99 Жыл бұрын
What's the system requirements for this Can we run this in google colab?
@engineerprompt
@engineerprompt Жыл бұрын
You can run this on colab. The specs really depend on the LLM you select but for ggm models a gpu with 11GB VRAM will be more than enough
@ManojChoudhury99
@ManojChoudhury99 Жыл бұрын
@@engineerprompt for cpu with 16gb ram is it possible to run lamma2 7b?
@MyASIF12345
@MyASIF12345 9 ай бұрын
localGPT does not answer when a query is asked
@littlesomethingforyou
@littlesomethingforyou 6 ай бұрын
same issue. i type in query and vs code just remains stuck. did you solve this?
@HedgeHawking
@HedgeHawking Жыл бұрын
Any advice on which model / setup to use for simple, lightweight tasks on a M1 chip with 8 GB ram? I tried the llama-2-7b-chat.ggmlv3.q4_0.bin with mps, but it's been working on the presidental term limit for the last 15min without any answer
@HedgeHawking
@HedgeHawking Жыл бұрын
Just FYI if anybody is interested: a MBA with M1 and 8 GB took 28.95 minutes to process the question "what is the presidential term limit?"
@engineerprompt
@engineerprompt Жыл бұрын
Later today, I will change the default embedding model to huggingface embeddings. That will reduce the amount of vRAM required
@SamirDamle
@SamirDamle Жыл бұрын
Is there a way we can have a Docker image of this that I can run as a container with everything configured by default?
@StarfilmerOne
@StarfilmerOne 5 ай бұрын
I thought same, at least config presets for models or smtng I don't understand
@teleprint-me
@teleprint-me Жыл бұрын
Pro Tip: Don't set max_tokens to the maximum value. It won't work out the way you hope. It's better to use a fractional value that represents a percentage of the maximum sequence length.
@Ericzon
@Ericzon Жыл бұрын
please, could you elaborate more this answer?
@bobo32756
@bobo32756 Жыл бұрын
@@Ericzon yes, this would be interesting !
@antdok9573
@antdok9573 Жыл бұрын
Has NO idea why
@teleprint-me
@teleprint-me Жыл бұрын
@Ericzon The sequence length represents the context window for a model. The max token parameter dictates the maximum sequence length the model can generate as output. When you input a text sequence, it becomes a part of the context window. The model will generate a continuation sequence based on the input. What do you think will happen as a result if the model is allowed to generate an output sequence that is as long as its context window while incorporating your input? For those that don't know, it generates an error in the best-case scenarios. The worst-case scenario is a bug of your own making that leaves you with absolute frustration.
@teleprint-me
@teleprint-me Жыл бұрын
@antdok9573 I was in the ER and then working on PyGPTPrompt while recovering at home. So 🤷🏽‍♂️. Also, if you think the Reactance Response is a clever mental hack, I would like to disagree with you. I find statements like this as rude as they are pretentious.
@richardjfinn13
@richardjfinn13 Жыл бұрын
Worth pointing out - you got it when you asked to show sources, but the presidentail term limit is NOT in Article 2. It's in the 22nd Ammendment, passed in 1951 after FDR was elected to a 3rd term.
@Shogun-C
@Shogun-C Жыл бұрын
What would the optimum hardware set up be for all of this?
@Yakibackk
@Yakibackk Жыл бұрын
With petal you can run from any hardware even mobile
@HedgeHawking
@HedgeHawking Жыл бұрын
@@Yakibackk Tell me more about pertal please
@FaridShahidinejad
@FaridShahidinejad Жыл бұрын
I'm running llama2 on LM studio which doesn't require all these convoluted steps on my Ryzen 9, 48GB of ram, and a 1080 card and it runs like mollases
@MikeEbrahimi
@MikeEbrahimi Жыл бұрын
A video with API access would be great
@kamilnwa8020
@kamilnwa8020 Жыл бұрын
Awesome video. QQ: how to add multiple “cuda”? the original code specifies `device="cuda:0"`. How to modify this line to use 2 or more GPUs?
@DhruvJoshiDJ
@DhruvJoshiDJ Жыл бұрын
one click installer for the project would be great.
@Swanidhi
@Swanidhi 11 ай бұрын
Great content! I look forward to your videos. Pleae also create a video to guide people who are new to the world of deep learning so that they know what to learn from where and things they need to learn to start contributing to projects such as localGPT. Also, another video on how to determine which quantized model can be run efficiently on a local system by specifying parameters for this assessment. Thanks again!
@giovanith
@giovanith Жыл бұрын
hello, this run in Windows 10 ? trying a lot here but no success (W10, 40 Gb Ram, RTX4070 12 Gb VRam). Thanks
@nikow1060
@nikow1060 Жыл бұрын
BTW if someone has issues setting up CUDA and conflict with bits and bytes: after checking CUDA in visual studio and checking that Cuda version matches torch requirements you can try the following for windows installation: pip install bitsandbytes-windows . Someone has provided a corrected bitsandbytes version for windows ... it worked for me after 24 hours of bitesandbytes errors compalining about Cuda installation
@ypsehlig
@ypsehlig Жыл бұрын
Same question as shogun-c, looking for hardware spec recommendations
@Socrataclysm
@Socrataclysm Жыл бұрын
This looks wonderful. Going to get this setup tonight. Anyone familiar with differences between privateGPT and LocalGPT? Seems like they give you mostly the same functionality.
@hish.b
@hish.b Жыл бұрын
I don’t think privategpt has gpu support tbh.
@MrBorkori
@MrBorkori Жыл бұрын
Hi, thank you for your content it's very useful! Now I'm trying to run with TheBloke/Llama-2-70B-Chat-GGML, but gives me an error - error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model Any help will be appreciated
@ssvfx.
@ssvfx. Жыл бұрын
after 10 hours straight... WE GOT ENTER A QUERY: LETSGOOOOOOOO
@zeeshanafzali8671
@zeeshanafzali8671 Жыл бұрын
Bruh useless
@aldisgailis9901
@aldisgailis9901 8 ай бұрын
@engineerprompt I would join in development, more on front end side, but at the moment seems this repo will mainly orient it self to be a doc reading tool, so idk. However, if this GPT could just be injesting files for knowledge, have access to internet to find out stuff, summarize. Sure id be up for it
@jennilthiyam1261
@jennilthiyam1261 10 ай бұрын
Hi. I have tried to chat with two CSV files. the thing is the model is not performing well. it is not even giving the correct answer when I ask about a particular value in a row given the key words. It is not good at all. I am using 70B. Does anyone have any idea how to make it more relatable? It does not even able to understand the data presented in CSV files.
@varunms850
@varunms850 9 ай бұрын
Can you give your system setting , RAM, VRAM, GPU and Processor ?
@blackout1819
@blackout1819 Жыл бұрын
the answer takes 2-3 minutes, and the answer itself looks extremely crazy. I do not advise
@littlesomethingforyou
@littlesomethingforyou 6 ай бұрын
hi i an unable to get back a response after i "enter query". vs code just gets stuck. is it because im running it on my cpu?
@adivramasalvayer3537
@adivramasalvayer3537 6 ай бұрын
Is it possible to use elastic as a vector database? If yes, may I get the tutorial link? Thank You
@simonmoyajimenez2045
@simonmoyajimenez2045 Жыл бұрын
I found this video really compelling. I believe it would be incredibly fascinating to leverage a CSV connection to answer data-specific questions. It reminds me of the article I read titled 'Talk To Your CSV: How To Visualize Your Data With Langchain And Streamlit.
@brianhauk8136
@brianhauk8136 Жыл бұрын
I look forward to seeing your different prompt templates based on the chosen model. 🙂
@hassentangier3891
@hassentangier3891 Жыл бұрын
I downloaded the code ,but it's already changed how to integrate the llama downlaoded,please
@electricsheep2305
@electricsheep2305 Жыл бұрын
This is the most important KZbin video I have ever watched. Thank you and all the contributors. Looking forward to the prompt template video!
@Elshaibi
@Elshaibi Жыл бұрын
Great video as usual, it would be great to have one click install file for those who are not expert
@engineerprompt
@engineerprompt Жыл бұрын
Let me see what I can put together
@anushaaladakatti4145
@anushaaladakatti4145 8 ай бұрын
Can we extract and show images from this, as it responds with text content? how to do it
@boscocorrea1895
@boscocorrea1895 Жыл бұрын
Waiting for the api video.. :)
@SadeghShahmohammadi
@SadeghShahmohammadi Жыл бұрын
Nice work. Very well done. When are you going to explain the API version?
@intuitivej9327
@intuitivej9327 Жыл бұрын
Hi thankful for you, I am experimenting llama2 13b ggml in my notebook. But a few days ago, I was realized that llama was not saved in my local but it was like.. i don't know.. snap shot...?I want to save the model and load it from my local path. I tried some codes but failed.. could you please guide me? Thank you again for your sharing ❤
@ToMtheVth
@ToMtheVth Жыл бұрын
Works quite well! Unfortunatelly performance on my M1 MB Pro is a bit of an issue. Ingested 30 Dokuments, Prompt eval time is 20 Minutes... I need better Hardware XD
@bourbe
@bourbe Жыл бұрын
Hello my dear, Thanks you for your video, they are really great. I have two use simple use case that I want to implement with this llama 2 model on cpu model I think the example in this video , can solve it but there are some information that I don't have to finalize this For these use case, I don't need a streamlit interface, just the possibility to inject a promt and save the result. I already have the equivalent of these two use cases with ChatGPT API + Python Also I don't want to be in an environment USE CASE 1 : ================================================================================================================================================ From a .text file, 1. I want to load the .txt in the model 2. I want to inject the prompt : "convert this in a markdown blog article :" 3. I want to save the result as Markdown (.md) file EXAMPLE OF QUERY FOR USE CASE 1 ================================ drive.google.com/uc?export=view&id=19QYO_cy6CU8UKwdau8ClW_XMsQgdpOzt EXAMPLE OF RESULT FOR USE CASE 1 USING HUGGING CHAT ==================================================== drive.google.com/uc?export=view&id=1sGTbLihHerIBTfckaZ-7NPJo3cOhYlfD USE CASE 2 : ================================================================================================================================================ From a .text file, 1. I want to load the .txt in the model 2. I want to inject a first prompt : "Create an outline from this test for a blog article" 3. I want to inject a second prompt which use the result of the first prompt : "Expand the outline in a detailled way to create an article of 1000 words" 4. I want to save the result of the second promt as Markdown (.md) file EXAMPLE OF QUERY FOR USE CASE 2 ================================ drive.google.com/uc?export=view&id=1R5eIjYJq3M_d0nZydaK0boW38W7iFvGF EXAMPLE OF RESULT FOR USE CASE 2 USING HUGGING CHAT ==================================================== drive.google.com/uc?export=view&id=1vWs0ll7lxT4oSULvt67D6BigTBIFzEU7 Thanks in advance for your answer
@ignacio3714
@ignacio3714 Жыл бұрын
hey friend. it seems that you know a lot!! could you give me a hand with this? I want to know what's the best way of creating your own "chatgpt" but giving a specific amount of X files. Let's say a small book, and then be able to ask questions about it, but running it locally or in the cloud (not using any third party basically). what's the best way of doing it? is it even possible? like running something like that in google colab with files from google drive or something like that? thanks in advance man!
@caiyu538
@caiyu538 Жыл бұрын
I have asked this question in your other video "This localgpt works greatly for my file, I use T4 GPU with 16GB cuda memory, it will take 2-4 minutes to answer my questions for a file with 4-5 pages. Is it expected to take so long to answer the question using T4 GPU?". After watching this video, I think I can use quantized version model instead of LLAMA 2 vicuna 7B model
@RameshBaburbabu
@RameshBaburbabu Жыл бұрын
Thanks Man !! , it was very useful. Here is one use case , for evolving user data embedding . in RDBMS DB we update a row and we can add more data in other tables with associated Foreign keys. can we embed say patient data day after day and ask questions for 1 year of particular patient ..? bottom line , I am looking for `incrementally embed` ....
@mmdls602
@mmdls602 Жыл бұрын
Use llama index in that case. It has bunch of libraries that you can use to “refresh” the embedding when you add or delete the data. Super useful.
@BrijeshSingh-f2c
@BrijeshSingh-f2c Жыл бұрын
how can I run it with rocM and AMD GPUs? I'm a noob here and want to explore this project.
@fenix20075
@fenix20075 Жыл бұрын
use original constitution.pdf is not convinceable, let's use Harry Potter or any other novel or history books as the testing, it would be more convinceable in the test. And I have test llama 2, and I don't think it is better than wizard uncensored 30B, maybe that is because the model scope is not in the same level?
@KJB-Man
@KJB-Man Жыл бұрын
It's a great app! Thank you! Suggestion, can you redo the requirements.txt to install torch with cuda support? It is irritating to have to uninstall torch and install it with conda and the cuda flag set. And then troubleshoot the issues that causes.
@abeechr
@abeechr Жыл бұрын
Seems like setup and installation went well, but when I enter a query, nothing happens. No error, no nothing. Many attempts… Any ideas?
@Diana-zo2ut
@Diana-zo2ut 4 ай бұрын
Trank you, I am having a issue: "D:\Datos\Documents\localgpt_llama2\.venv\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 286, in __init__ modules = self._load_sbert_model( TypeError: _load_sbert_model() got an unexpected keyword argument 'cache_folder'" but I can solve it
@nadavel85
@nadavel85 Жыл бұрын
Thanks a lot for this! Did anyone encounter this error and resolved it? "certificate verify failed: unable to get local issuer certificate"
@md.ashrafulislamfahim3106
@md.ashrafulislamfahim3106 Жыл бұрын
Can you please tell me how can I use this project for implementing it using django for a chatbot?
@brianrowe1152
@brianrowe1152 Жыл бұрын
Mine says bitsandbytes is deprecated, but then when I try to do pip install says its already met, but when I run again it says deprecated.. please install.
@touristtam
@touristtam Жыл бұрын
How is it that this type of LLM generator project have usually no test what so ever and heavily lean on Conda?
@kashishvarshney2225
@kashishvarshney2225 Жыл бұрын
I run it on cuda but it's still giving ans in 3 mins how can I improve it's speed please someone reply
@gold-junge91
@gold-junge91 Жыл бұрын
It would be grateful to add it to my paperless-ngx
@extempore66
@extempore66 9 ай бұрын
Hello everyone, using a local (cached) llama2 model with LanceDB as an index store. Ingesting many PDF files and using a fairly standdard prompt template ("given the context below {context_str}. Based on the context and not prior knowledge, answer the query. {query_str} ..."). It is very frustrating getting different answers for the same question. Aany similar experiences? Thhanks
@engineerprompt
@engineerprompt 9 ай бұрын
Have you looked at the source documents returned in each case? That will be a good starting point. Also potentially reduce the temperature of the LLM
@extempore66
@extempore66 9 ай бұрын
@@engineerprompt Thank you for the prompt response. I have looked at the documents. Also ran a Retriever evaluator (Faithfulness, relevancy, Correctness ...) Sometimes it passes sometimes it does not. Still trying to wrap my brain around why the answers are so different while the nodes (vectors) returned are obviously consistently the same
@bhavikpatel7612
@bhavikpatel7612 Жыл бұрын
Getting error for Nonetype object is not subscriptable. Can you please help.
@TheCopernicus1
@TheCopernicus1 Жыл бұрын
Amazing project mate, any recommendations on running the UI version? I have followed the documentation and changed to the GGML versions of Llama2 however keeps erroring out. Could you perhaps recommend any additional instructions? I am running on M1 many thanks!
@prestonmccauley43
@prestonmccauley43 Жыл бұрын
I had the same issue on PAC llama errored out....
@IbrahimAkar
@IbrahimAkar Жыл бұрын
Use the GGUF models. I had the same issue.
@mibaatwork
@mibaatwork Жыл бұрын
You talk about cpu and Nvidia, what’s about AMD? Can you add it also?
@tk-tt5bw
@tk-tt5bw Жыл бұрын
Nice video. But can we make some videos for a M1 silicon MacBook
@CharanjitSinghPhull
@CharanjitSinghPhull Жыл бұрын
great video helped a lot but there is an issue that when passing parameters with --show_sources and also asking the llama2 a question outside the data which was ingested , it provides answer for that and states an incorrect source document which has nothing to do with the actual question, and why provide all data which were used cant we directly get line number and page number with the document name only.
@lucas_badico
@lucas_badico 7 ай бұрын
for an Macbook m1 pro, what model you recommend?
@engineerprompt
@engineerprompt 7 ай бұрын
Will depend on how much memory your machine has. A base version will run 7/13B quantized models.
@lucas_badico
@lucas_badico 7 ай бұрын
@@engineerprompt mine has 16GB
@1989arrvind
@1989arrvind Жыл бұрын
Great👍
@fabulatetra8650
@fabulatetra8650 Жыл бұрын
Is this using Fine Tuning on Llama-2 Model? thanks
@VoltVandal
@VoltVandal Жыл бұрын
Thank you ! working really great, even on a M1. Just one question, i compiled/ran llama_cpp on my M1 and it is using GPU, can this be somehow work also for your project ? [I'm just a beginner, THX]
@engineerprompt
@engineerprompt Жыл бұрын
Yes, under the hood, it's running the models on llama_cpp.
@VoltVandal
@VoltVandal Жыл бұрын
Ha, got it, maybe no problem for pro's, but had just to: CMAKE_ARGS="-DLLAMA_METAL=on" FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir And now it is using GPU !
@gamerwalkers
@gamerwalkers Жыл бұрын
@@VoltVandal Hi Martin. Could you help with me on steps how you made it work? I am using m1 but model is running very slow. How fast does it give you a response. For me ranging between 3-5 minutes.
@VoltVandal
@VoltVandal Жыл бұрын
@@gamerwalkers Well, this is due to the endless swapping (16GB mbp with 7B model). I got it running directly with the main command from the llama-cpp. But if i start embedding etc. i'm ending with the same 5min swap. I'm not a python pro or ai pro, so can not give you an answer, why this happens. I returned to my old nvidia machine, unfortunately. So at least it is working, not getting GPU errors, but still no fun. 😞 Maybe some ai pro can tell more on that issue.
@gamerwalkers
@gamerwalkers Жыл бұрын
@@VoltVandal when you say directly from the main command from llama-cpp are you able to run that command trained on our own pdf like how video has done? Or you are just referring to default llama prompts that answers queries but not trained on our own pdf data?
@alaamohammad6422
@alaamohammad6422 Жыл бұрын
Please i get error tbe repo liama2 not existing !!!!😢😢😢😢
@trobinsun9851
@trobinsun9851 Жыл бұрын
Does it needs a powerful machine ? a lot of RAM ?
@jerkoviskov5379
@jerkoviskov5379 Жыл бұрын
Do i need GPT4 API to run this on my machine?
@bowenchen4908
@bowenchen4908 Жыл бұрын
Is it very slow if we run locally? thank you in advance
@Weltraumaff3
@Weltraumaff3 Жыл бұрын
Maybe a stupid question and I'm just missing what --device_type I should enter but: I'm struggling a bit using my AMD 6900XT. I don't want to use my CPU and PyTorch doesn't seem to work with OpenCL for example. Has anyone got an idea? Cheers in advance
@engineerprompt
@engineerprompt Жыл бұрын
in this case, cpu :)
@ihydrocarbon
@ihydrocarbon Жыл бұрын
Seems to work on my Fedora 38 ThinkPad i5, but am confused as to how to train it on other data. I removed the constituion.pdf file and it still responds to queries with answer that refer to that document...
@engineerprompt
@engineerprompt Жыл бұрын
There is a DB folder, delete that and rerun the ingest.py
@RahulKant-qj7lv
@RahulKant-qj7lv 11 ай бұрын
How we can increase the response time, if we ask question then answers should be quick, from where I can learn this part ?
@engineerprompt
@engineerprompt 11 ай бұрын
You will need to use better GPU and a quantized model
@RahulKant-qj7lv
@RahulKant-qj7lv 11 ай бұрын
@@engineerprompt I am using gpt-3.5-turbo and running it on my 32GB MAC , this configuration will produce slow results ? Like its giving result in 60 seconds on simple documents. Can we implement other techniques to optimize it ?
@nattyzaddy6555
@nattyzaddy6555 Жыл бұрын
Is the larger embeddings model much better than the smaller one?
@ihebakermi943
@ihebakermi943 4 ай бұрын
thank
@senadbalkan
@senadbalkan Жыл бұрын
There is sentence at the beginning of the video. "No data leaves your device and 100% private." Can someone explain this sentence or share me a content to understand it better? How technically we can be sure no data exposed to LLama when we are using their LLM?
@engineerprompt
@engineerprompt Жыл бұрын
When you run this for the first time, it will download all the models that are needed. After this you can disconnect the internet and localGPT will still work :)
@mvdiogo
@mvdiogo Жыл бұрын
hi, very nice project, can I help? I get some codes better and would like to share
@engineerprompt
@engineerprompt Жыл бұрын
Yes, contributions are welcome and would greatly appreciate it 🙏
@olegpopov3180
@olegpopov3180 Жыл бұрын
Where is the model weights update (finetune)? Or am i missing something in the video... So, you created embeddings and store them in the DB. How are they connected to the model without finetuing process?
@engineerprompt
@engineerprompt Жыл бұрын
I would recommend to watch the localgpt video, link is in the description. That will clarify alot of things to you. In this case, we are not doing any fine-tuning of the model search. Rather we do a semantic search on most relevant parts of the document and then give those parts along with the prompt to the LLM to generate an answer
@brookyu2
@brookyu2 Жыл бұрын
on macbook pro with m1 pro, after inputting the prompt, nothing is returned. the program is running forever. anything to do with my pytorch installation?
@gamerwalkers
@gamerwalkers Жыл бұрын
worked ok for me. but result was slow. took 3-5 minutes for me
@nattyzaddy6555
@nattyzaddy6555 Жыл бұрын
What if there are multiple safetensors files? Answer: Pick one of the model names and set it as MODEL_BASENAME. For example -> MODEL_BASENAME = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
@rahuldayal8406
@rahuldayal8406 Жыл бұрын
Great video, I am using M2 pro it answers my query really slow, how can I make it quick?
@gamerwalkers
@gamerwalkers Жыл бұрын
same problem using m1
@RealEnigmaEngine
@RealEnigmaEngine Жыл бұрын
Is there a support for AMD GPUs?
@chengqian5737
@chengqian5737 Жыл бұрын
hello, thanks for the video. However, what's the meaning of having chunk size as 1000, while the sentence embedding model can only take a maximum of 128 tokens at a time? I would suggest to reduce the chunk size to 128 if you prefer the sentence transformer.
@РыгорБородулин-ц1е
@РыгорБородулин-ц1е Жыл бұрын
took someone this short to create a video about it)
How to use Google PaLM API for Free
11:31
Prompt Engineering
Рет қаралды 19 М.
LocalGPT API: Build Powerful Doc Chat Apps
14:43
Prompt Engineering
Рет қаралды 31 М.
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 2,4 МЛН
The joker favorite#joker  #shorts
00:15
Untitled Joker
Рет қаралды 30 МЛН
Шок. Никокадо Авокадо похудел на 110 кг
00:44
GIANT Gummy Worm Pt.6 #shorts
00:46
Mr DegrEE
Рет қаралды 94 МЛН
Talk to Your Documents, Powered by Llama-Index
17:32
Prompt Engineering
Рет қаралды 83 М.
EASIEST Way to Fine-Tune LLAMA-3.2 and Run it in Ollama
17:36
Prompt Engineering
Рет қаралды 4,6 М.
The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!
16:14
host ALL your AI locally
24:20
NetworkChuck
Рет қаралды 1,1 МЛН
LocalGPT: OFFLINE CHAT FOR YOUR FILES [Installation & Code Walkthrough]
17:11
Run your own AI (but private)
22:13
NetworkChuck
Рет қаралды 1,5 МЛН
Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private
12:24
Prompt Engineering
Рет қаралды 14 М.
Can VISION Language Models Solve RAG? Introducing localGPT-Vision
17:06
Prompt Engineering
Рет қаралды 7 М.
LocalGPT & Llama-2: Adding Chat History & Custom Prompt Templates
14:23
Prompt Engineering
Рет қаралды 28 М.
БЕЛКА СЬЕЛА КОТЕНКА?#cat
00:13
Лайки Like
Рет қаралды 2,4 МЛН