I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Рет қаралды 151,725

Күн бұрын

👨‍💻 Sign up for the Full Stack course and use KZbin50 to get 50% off:
www.coursesfro...
🐍 Get the free Python course
go.coursesfrom...
Hopefully you enjoyed this video.
💼 Find AWESOME ML Jobs: www.jobsfromni...
🤖 Get the Code:github.com/nic...
Learn how to use Llama 2 70B Chat for Retrieval Augmented Generation...for FINANCE! Albeit in a hella haphazard way, oh, we'll also build a streamlit app while we're at it.
Oh, and don't forget to connect with me!
LinkedIn: bit.ly/324Epgo
Facebook: bit.ly/3mB1sZD
GitHub: bit.ly/3mDJllD
Patreon: bit.ly/2OCn3UW
Join the Discussion on Discord: bit.ly/3dQiZsV
Happy coding!
Nick

Пікірлер: 207

@yudhiesh1997 Жыл бұрын

You can’t load Llama2-70b on a single A100 GPU. Using full precision(float-32) would require 70billion * 4 bytes = 280GB of GPU memory. If you load it using float-16 it would reduce by half to 140GB. It finally worked cause you loaded it in int-8 which only requires 70GB of memory while the A100 has 80GB of GPU memory. If you wanted to load it in full/half precision you would need multiple GPUs and also need to leverage tensor parallelism whereby you slice the tensors across multiple GPUs.

@NicholasRenotte Жыл бұрын

I'm not sure that was it, I successfully loaded in half precision over 2xA100-80GB (didn't show the loading in the vid). But when I went to generate this is what I came up against: github.com/huggingface/transformers/issues/24056. Solid calcs though!

@sluggy6074 Жыл бұрын

That's nice. I'll just have to settle for my quantized 70b LLMs that run hot and fast on my 4090. I think I can live with this.

@agusavior_channel Жыл бұрын

Use petals

@seanhuver4813 Жыл бұрын

It runs nice @ 4bit precision on A6000.

@bubbleboy821 11 ай бұрын

What you meant to say was that you can load LLama2-70b on a single A100 GPU, you just have to run it in int-8.

@Principality92 Жыл бұрын

Always looking forward to your videos... I've an MSc. In AI, but I still learn from you 👏🏼

@MikeAirforce111 Жыл бұрын

I have a PhD and I am here as well 🤷‍♂

@siestoelemento4027 Жыл бұрын

I guess I'm on the right path then @@MikeAirforce111

@MikeAirforce111 Жыл бұрын

This video was great. You have created a format that is very entertaining to watch! 🙌 Subbed!

@NicholasRenotte Жыл бұрын

Thanks so much Mike!

@juanpablopereira1479 Жыл бұрын

I think "Amazing" falls short, the amount of knowledge, the fact that your using cutting edge Open source model and all of that in a really funny and light tone. Keep up the good work! I have a question, do you think is much harder to deploy that app into google cloud run compared with runpod?

@NicholasRenotte Жыл бұрын

Thanks so much Juan! I can't imagine it would be running on a VM instance with GPUs attached. Could also separate out the LLM bit and run that solely on a GPU then just run the app on a basic linux instance!

@moondevonyt Жыл бұрын

first off, respect for the hustle and the in-depth breakdown of integrating llama with other tools really shows how much work goes behind the scenes that said, not sure why everyone's so hyped about all these new models when sometimes simpler and older architectures can do the trick but hey, if it's all about pushing boundaries and experimenting, you're killing it bro!

@NicholasRenotte Жыл бұрын

Thanks a mill moondev!! Yah at this point I'm just pushing to see where it's going, I started fine tuning this for some custom use cases and it looks hyper promising though!

@xt3708 Жыл бұрын

Thanks so much for the detailed videos @@NicholasRenotte Can you make a video on fine tuning?

@vyrsh0 Жыл бұрын

can you name some of the old models? so I can look them up and learn about them?

@ZombiemanOhhellnaw Жыл бұрын

@@vyrsh0 @moondevonyt yes, I would like to learn what older models do the trick as well!

@i2c_jason Жыл бұрын

...Sunday morning after a bender hahhaaha bro I love you.

@NicholasRenotte Жыл бұрын

Best time to deploy imho 😅

@ba70816 Жыл бұрын

Really great content, you might have the most effective style I’ve ever seen. Well done. I can’t remember which video I saw where you spoke about your hardware setup. It’s cloud based isn’t it?

@NicholasRenotte Жыл бұрын

Thanks a mil! This particular instance is cloud based, yup! It's all runpod, I used a remote SSH client to use the env with VsCode. Old HW vid might have been this: kzbin.info/www/bejne/fXmUg6iBnNqCa5Y

@ba70816 Жыл бұрын

Would you consider a video showing the setup process you use?

@horane Жыл бұрын

minute 4:45 comment is confirmation clutch! Never give up!

@PritishMishra Жыл бұрын

Amazing editing and content, learnt a lot.

@NicholasRenotte Жыл бұрын

🙏🏽

@FrankHouston-v5e 8 ай бұрын

You are marvelous! I bow down after witnessing your next level hacking skills 🧐.

@youssefghaouipearls 2 ай бұрын

Hello, This seems like a less expensive approach than using Google Cloud. How much did it cost?

@himanshuahujaofficial7813 6 ай бұрын

Nick, thank you so much for the great content. I’m new to AI and want to build an LLM for my startup, but I’m not sure where to start. Can you recommend something?

@thepirate_kinz1509 Жыл бұрын

Can we have a tutorial on conditional GANs please. And multi feature conditional gans as well 😊

@billlynch3400 Ай бұрын

where can we find the code that you use in the video? Can you please share it.

@jennilthiyam1261 10 ай бұрын

How can we set up llama=-2 on the local system with memory. Not just one time question but interactive conversation like online chatGPT

@emrahe468 Жыл бұрын

I guess, what you do is: Dividing the long document into chunks, and when you ask a question about the document you search these chunks via query_engine.querry(). And find the most possible match with these chunks and the question. Some kind of narrowing down the document to question of interest In short, you are not dealing with the entire document when the question is asked , only the most appropriate slice. Hence, this method may not be good for internalizing the document's general information, nor good for creating an overall summary Am i right?

@umairrana6524 Жыл бұрын

Hello Dear Upload Video IBM Cognos Analytics Integration with IBM =Planning Analytics

@dakshbhatnagar Жыл бұрын

1:44 Zimbaabwe? 😂

@NicholasRenotte Жыл бұрын

😂

@towards_agi Жыл бұрын

1:42 😅 When did Nic become Zimbabwean

@NicholasRenotte Жыл бұрын

Tuesday 🤣

@zimpot1690 Жыл бұрын

Can you do a machine learning for trading

@NicholasRenotte Жыл бұрын

Yah, it's definitely on my list!

@malice112 Жыл бұрын

I thought that llama 2 was 100% free to use with unlimited tokens so why did you have to pay $1.69 per hour?

@zerorusher Жыл бұрын

The software is free, but you need to pay for the gpu usage in a cloud server to run it.

@NicholasRenotte Жыл бұрын

What he said ^ tbh this was me being a tad bit lazy, could probably run it for even cheaper.

@Warung-AI-Channel 6 ай бұрын

this is gonna be huge 😅

@malice112 Жыл бұрын

NIcholas I love your videos and your way of making learning about ML/AI fun! In your next video can you please show us how to fine-tune a LLM model! Thanks for all the hard work you do on making these videos!

@splitpierre 7 ай бұрын

Yeah, nice work! I've been playing around RAG as well, I can relate to all roadblocks and pain points. I'm trying to squeeze as much as possible so I can have a decent RAG, without any fancy GPU, consumer grade hardware running everything local, it's been fun/painful

@ShahJahan_NNN Жыл бұрын

please make a video on ocr on past question papers that can extract questions, and extract keywords and analyse with 10 years papers, and predicts upcoming questions

@angelazhang9082 5 ай бұрын

Hi Nick... really late but would be super grateful for a response. I'm trying to figure out how you used RunPOD for this. It looks like you created a folder to store the weights instead of using one of their custom LLM options. Did you pay for extra storage? I can't imaging you loaded all the weights each time you needed to use this on the cloud. I'm new to working with these models and cloud GPUs, so any help is greatly appreciated!

@moshekaufman7103 6 ай бұрын

Hey Nicholas, It's a little disappointing that you haven't actually released the final model yet, even though you mentioned it in the video. While showing the source code is a good start, it's not the same as actually providing the finished product. Unfortunately, without the final model itself, it's difficult to take your word for it. To build trust and transparency, it would be much better to provide a download link for the model so people can try it out for themselves. This would be a much more impactful way to share your work and allow others to engage with it. I hope you'll reconsider and release the final model soon!

@tejaskumarreddyj3133 2 ай бұрын

Can you please make a video explaining what is the LLM to use when developing a RAG!! It would be of great help if you could make one and also please tell us about how to run this locally on linux!!😁

@jangamYashwanthRaju 8 ай бұрын

plz make a VIDEO on ML and DEEP LEARNING setup in mac book air M1 in entire youtube there is no such a perfect video for this setup

@farseen1573 5 ай бұрын

What platform you are using for 1.69$/hr gpu? Cant find any good gpu cloud providers🥺

@wasgeht2409 Жыл бұрын

Nice Video! I think it is impossible to use LLaMA 2 70B on a MacPro M1 with 8 GB RAM :( or is there a any chance without cloud services to use it locally ?

@NicholasRenotte Жыл бұрын

Could give it a crack with the GGML models, haven't tried it yet though tbh!!

@mrrfrooty 3 ай бұрын

Hi, could you provide the runpod source code for this? Can't find any outside documentation on how you made this possible

@muradahmad9357 5 ай бұрын

can you please tell which cuda version and nvidia driver versions you used, I am having problem downloading it

@Nick_With_A_Stick Жыл бұрын

My computer is currently training a lora on stable 7b for natural language to (30k)python, and (30k)sql. I also Included (30k)orca questions so it dosent loose its abilities as a language model, and 20k sentiment analysis for new headlines. I would love to try this model with this as soon as Its done training.

@NicholasRenotte Жыл бұрын

Noiceee, what data sets you using for Python?

@dacoda85 Жыл бұрын

Love this style of video. Fantastic content as always mate. You've given me some ideas to try out. Thanks :)

@NicholasRenotte Жыл бұрын

🙏🏽 thanks for checking it out!

@knutjagersberg381 Жыл бұрын

Do you really have to get access by Meta to use the weights? My current interpretation is that you enter the license agreement as soon as you use the weights, where ever you got them (as you're also allowed to redistribute them). I'm not 100% sure about this, but I think you don't need to register. I think that's more for them to keep track of early adopters.

@americanswan 8 ай бұрын

Can someone explains to me the money required to run an AI application on my local machine?

@Shishir_Rahman_v 7 ай бұрын

I have learned enter intermediate level machine learning, now can I start deep learning along with machine learning. please sir tell me

@chrisweeks8789 Жыл бұрын

All facets of your work are incredible! Are the context limits of llama2 similar to that of OpenAI?

@NicholasRenotte Жыл бұрын

Thanks a mil! Would depend on which models you're comparing!

@nfic5856 Жыл бұрын

How can it be scalable, since this deployment costs like 2$ per hour? Thanks.

@NicholasRenotte Жыл бұрын

Didn't show it here but if I were scaling this out, the whole thing wouldn't be running on a GPU. The app would be on a lightweight machine and the LLM running on serverless GPU endpoints.

@Ryan-yj4sd Жыл бұрын

@@NicholasRenotte but you would still need to pay to rent an A100 GPU which around $1 to $4 per hour

@NicholasRenotte Жыл бұрын

Yeah, no real way around that, gotta host somewhere! Especially so if you want to be able to use your own fine-tuned model eventually (coming up soon)!

@nfic5856 Жыл бұрын

Does gpt3.5-turbo (4k or 16k context) remain cheaper in a small production scale?

@wayallen831 Жыл бұрын

Great tutorial! Can you also help do a tutorial on setting up runpod to host the application on it? Found that part to be a bit confusing and would love a more thorough walk thru. Thanks for all you do!

@NicholasRenotte Жыл бұрын

Ya, might do something soon and add it to the free course on Courses From Nick. I'm saving infra style/setup videos for the Tech Fundamentals course.

@deadcrypt 11 ай бұрын

8:57 nice auth key you got there

@user-pp4ts5ob1u Жыл бұрын

Excelet video, you are amazing, please update the video "AI Face Body and Hand Pose Detection with Python and Mediapipe", I can't solve the errors, it would be very useful for my university projects, thank you very much.

@NicholasRenotte Жыл бұрын

Will take a look!

@ahmadshabaz2724 Жыл бұрын

How i get free gpu in web server. I don't have gpu.

@FunCodingwithRahul Жыл бұрын

Incredible stuff done... thank you Nich.

@NicholasRenotte Жыл бұрын

Anytime!! Glad you liked it Rahul!

@nearata01 9 ай бұрын

its not thaaat special, but if it's part of "research" yeah than its ok

@深夜酒吧 7 ай бұрын

i would rather run 70b on my 300gb ram xeon cpu

@scottcurry3767 Жыл бұрын

RunPod A100 instances are looking scarce, any tips on how to adapt for multiple GPU instances?

@NicholasRenotte Жыл бұрын

Going to give it a crack this week, i've got a fine tuning project coming up. Will let you know. The other option is to use the GGML/4 bit quantized models, reduces the need for such a beefy instance. Also, check out RunPod Secure Cloud, a little pricier but seems to have more availability (I ended up using SC when I was recording results for this vid because the community instances were all unavailable). Not sponsored just in case I'm giving off salesy vibes.

@nimaheydarzadeh343 Жыл бұрын

it's great , I try to find like this

@fur1ousBlob Жыл бұрын

I wanted to use llama in a chatbot. Do you know if that will be possible? I want to know your opinion. I am using rasa framework to build the chatbot but I am not sure how to integrate it.

@NicholasRenotte Жыл бұрын

Sure can! Seen this? forum.rasa.com/t/how-to-import-huggingface-models-to-rasa/50238

@JarFah 9 ай бұрын

Make a video using it in html and JavaScript

@Bliss_99988 Жыл бұрын

'How to start a farm with no experience' - Hahaha, man, I just want to say that I love your sense of humour. Also, your videos are really useful for me, I'm an English teacher and I'm trying to build useful tools for my students. Thanks for your content.

@NicholasRenotte Жыл бұрын

😂 it's my secret dream job! Hahah thanks so much for checking it out man!!

@frazuppi4897 10 ай бұрын

TL;DR basic RAG with Llama 70B, nothing more, nothing less - (thanks a lot for the video, really well done)

@synthclub 7 ай бұрын

Really cool llama application. Really Impressive.

@projecttitanium-slowishdriver Жыл бұрын

Huge thanks for you videos. Nowadays I code, demonstrate, and perhaps lead AI, ML, DL, and RL development in 1300 + worker engineering and consulting company I am combining technical analysis tools (fem, CFD, MBS…) with AI to generate new digital business cases

@NicholasRenotte Жыл бұрын

Ooooh, sounda amazing!

@projecttitanium-slowishdriver Жыл бұрын

@@NicholasRenotte It is a 13 workers' digital business development group:) But thanks again mate

@vitalis Жыл бұрын

can you do a video about analysing trends from websites such as WGSN?

@NicholasRenotte Жыл бұрын

You got it!

@ullibowyer 7 ай бұрын

Most people pronounce cache the same way as cash 💲

@ytsks 7 ай бұрын

When someone tells you they made something "as good as" or "better" than chatgpt, remember that even FB do not compare l70b to current gpt-4 turbo, but the previous release.

@powray Жыл бұрын

Wow

@Ryan-yj4sd Жыл бұрын

Nice video. You seem to have the taken the tough route. I didn't have as much trouble :)

@NicholasRenotte Жыл бұрын

LOL, murphy's law!

@shipo234 Жыл бұрын

Nick this is insanely good, thank you for the effort

@NicholasRenotte Жыл бұрын

Thanks a mil!!!

@pantherg4236 Жыл бұрын

What is the best way to learn deep learning fundamentals via implementation (let's say pick a trivial problem of build a recommendation system for movies) using pytorch in Aug 26, 2023? Thanks in advance

@Roy-h2q 9 ай бұрын

Gosh when i use GPT4 , it give me a response saying it can not further summarize personal report and it just stop there. I think i will just need to switch to a diff models

@mohamedkeddache4202 Жыл бұрын

please help me 😓 ( in your videos of licence plate tensorflow) i have this error when i copy the train command in cmd : ValueError: mutable default for field sgd is not allowed: use default_factory

@20S030SHIVAYAVASHILAXMIPRIYA.S Жыл бұрын

What are the difference between the meta released llama 2 models , hf models and quantised model (ggml) files found in the hugging face? Why cant we use the meta/llama-2-70b model ?

@NicholasRenotte Жыл бұрын

You could! llama-2-70b is the base model, chat is the model fine-tuned for chat. the GGML model is a quantized model (optimized for running on less powerful machines). The hf suffix indicates that it's been updated to run with the transformers library.

@poornipoornisha5616 Жыл бұрын

@@NicholasRenotte The 70b chat model downloaded from meta has consolidated.pth files in it. How to use the files to finetune the model for custom datasets ?

@leonardoariewibowo7867 Жыл бұрын

do you use linux? because i cant run this with my windows machine, bitandbytes didn't support windows for cuda >= 11.0

@Kingupon Жыл бұрын

I am saying Do I need to know metric level math to get Ahead in machine learning or just Know how things work like the specific or library I'm using Pls answer my Question

@Dave-nz5jf Жыл бұрын

Yeah but what about asking who was US president in 2006?

@NicholasRenotte Жыл бұрын

😅 let me give it a crack. There's an old saying in Tennessee...

@zamirkhurshid261 Жыл бұрын

Nice sharing sir your way of teching is very helpful for biggner. Please make a video how we can make deep learning model on earthquake dataset as you have make a project on image classification.

@NicholasRenotte Жыл бұрын

You got it!

@zamirkhurshid261 Жыл бұрын

Waiting sir

@en-iyi-benim Жыл бұрын

A deep learning in Pytorch video pleaseee

@randomthoughts7838 Жыл бұрын

Hey, is there some structured way(steps) to learn to work with llms. As an analogy, DSA is one structured way to solve coding problems. I am new to llms realm and any advice is much appreciated.

@sergeyfedatsenka7201 Жыл бұрын

Does anyone know if renting GPU is cheaper than using Open AI API? By how much? Thank Nicholas for your great content!

@achuunnikrishnan-w1h 6 ай бұрын

I love you nicholas....you are awesome .my only regret is that i didn't found you earlier. all my dream projects in a channel....thankyou

@krishnakompalli2606 11 ай бұрын

As you have used RAG method, I'd like to know how it can answer extrapolated questions?

@autonomousreviews2521 Жыл бұрын

Great share! Thank you for your persistence and giving away your efforts :)

@NicholasRenotte Жыл бұрын

Anytime! Gotta share where I can!

@malice112 Жыл бұрын

I am confused is Llama 2 an LLM or did you use the Huggingface LLM ?

@NicholasRenotte Жыл бұрын

LLaMA 2 70b is the LLM, we loaded it here using the Hugging Face library.

@lashlarue59 Жыл бұрын

Nick you said that you were able to build your lip reading model in 96 epochs. How long in an epoch in real time?

@eel789 Жыл бұрын

How do I use this with a react frontend?

@NicholasRenotte Жыл бұрын

Could wrap the inference side of the app up into an api with FastAPI then just call out to it using axios!

@ml-techn Жыл бұрын

Hi, thanks for the video. which gpu are using? I want to buy and build a dl machine to play with llm.

@kevynkrancenblum5350 Жыл бұрын

2:40 😂 Thanks Nic the video is awesome ! 🤘🏽🤘🏽🤘🏽

@NicholasRenotte Жыл бұрын

LOL, stoked you liked it Kev!!

@BudgetMow 10 ай бұрын

thank you for this tutorial , although i am facing a slight issue in parsing tables from pdfs , i managed to allow the parser to take in multiple documents , and it is answering in a quick time , only issue with if the question is related to data within a table or some times data spanning multiple lines it fails to retrieve that data

@yashsrivastava4878 Жыл бұрын

hey can it be done on chainlit along with LMQL and Langflow added to it, output shows pdfs file as a reference and scores based on whether its retrieves factual data or makes up its own answer

@eddysaoudi253 Жыл бұрын

Hi! Nice video! Is it possible to use llama2 to build an app like autogpt or gpt researcher in a local environment?

@Dave-nz5jf Жыл бұрын

Essentially runpod is a local environment. It's a linux server in the cloud , but it's no different than local linux server.

@NicholasRenotte Жыл бұрын

Yup, what he said ^!

@ciberola285 Жыл бұрын

Hi Nicolas, are you planing to make a video on training OWL-ViT model?

@Hgyr-k8j Жыл бұрын

Can you do opusclip boss 😢

@hebjies 11 ай бұрын

It is possible that when you tried to load the pdf with SimpleDirectoryReader, it was skipping pages, because of the chunk size /embedding model you selected, the model you selected (all-MiniLM-L6-v2) is limited to 384 while the chunk you specified was 1024, maybe and just maybe, that is why I think it was skipping pages, because it was unable to load all the chunk in the embedding model

@emanuelsanchez5245 Жыл бұрын

Hi! What was the performancee of the method? How many tokens per second with that deployment?

@sunkarashreeshreya451 Жыл бұрын

You are brilliant. I've been trying to find a tutorial for slidebot.. could you work on it ?

@zakaria20062 11 ай бұрын

Waiting open source like function call ChatGpT will be amazing

@jyothishkumar.j3619 Жыл бұрын

What are the limation on monetizing Ilama Banker app ? Could please explain?

@ricowallaby Жыл бұрын

Hi, just found your channel and enjoying it, but I can't wait till we have real Open Source LLMs, anyway keep up the good work, cheers from Sydney.

@vanshpundirv_p_r9796 Жыл бұрын

Hey, Can you tell me minimum vram, ram and space required to load and inference from the model?

@Tripp111 9 ай бұрын

Thank you. ❤️🍕

@richardbeare11 11 ай бұрын

Love your videos Nicholas. Watching this with my morning coffee, a few chuckles, and a bunch of "ooohhh riiiiiight!"s. Your vid bridged a bunch of gaps in my knowledge. Gonna be implementing my own RAG now 😎👍

@warthog123 4 ай бұрын

Excellent video

@tenlancer Жыл бұрын

what is the response time for each query? and which GPU did you use for this app?

@AraShiNoMiwaKo 7 ай бұрын

Any updates?

@nimeshkumar8508 Жыл бұрын

Thankyou so much for this. God bless you

@NicholasRenotte Жыл бұрын

🙏🏽

@vikassalaria24 Жыл бұрын

i am getting error:ValidationError: 1 validation error for HuggingFaceLLM query_wrapper_prompt str type expected (type=type_error.str). I am using 7b chat llama2 model

@divyanshumishra6739 Жыл бұрын

Did you resolved that error? I am gettting same error and iam unable to solve it