API For Open-Source Models 🔥 Easily Build With ANY Open-Source LLM

Рет қаралды 99,006

Күн бұрын

Пікірлер: 193

@daithi007 Жыл бұрын

Incredible content, and he doesn't waffle either!!! just to the point, good pace, great voice, great cadence, and perfect audio levels. This channel is gonna be big.

@matthew_berman Жыл бұрын

Thank you :)

@marilynlucas5128 Жыл бұрын

@@matthew_berman With open llm , you don't get an Open AI like Api token right?

@marilynlucas5128 Жыл бұрын

@@matthew_berman How can a project like Aider utilized open llm?

@shotelco Жыл бұрын

Yet another piece to the democratization of AI! Very valuable.

@matthew_berman Жыл бұрын

Agreed!

@marilynlucas5128 Жыл бұрын

Yes indeed!

@MrGaborKukucska Жыл бұрын

The future is now 🙌🏼

@applyingpressureeveryday Жыл бұрын

Democracy means those in power rule. We live in a democracy that’s clearly 1000% centralized. I got the message tho. 👍🏿

@josephsagotti8786 Жыл бұрын

@@applyingpressureeveryday Democratization of technology means the de-centralization of technology.

@maxamad13 Жыл бұрын

First time man. To the point and straight forward, Thank youuuuu !!!!!!!!!!!!

@matthew_berman Жыл бұрын

You got it!

@pancakeflux Жыл бұрын

This is exactly what I’ve recently been looking for! Thanks for showing it off :)

@paulbishop7399 Жыл бұрын

stop it I cant keep up anymore :) everyday I am pivoting around your content, gimme a break already! What an exciting time to be alive!

@matthew_berman Жыл бұрын

Haha nice :) wait until you see the next video!

@VastIllumination Жыл бұрын

You are becoming my favorite AI channel! This is literally exactly what I've needed. I've been looking for an open llm alternative to openai API for querying PDFs with Langchain. I haven't been able to test the largest LLMs using Langflow because it always times out from Huggingface.

@matthew_berman Жыл бұрын

Glad I could help 🎉

@sjimosui8279 Жыл бұрын

Matthew are you pushing it to github? I 'm also working for the same looking for ideas but a beginner though looking for help

@ajith_e Жыл бұрын

Had'nt heard of OpenLLM before but now I can't hold my excitement to test it out. Well paced, Well executed tutorial that touches on the important aspects of deployment. Please follow this space closely because we'll be following you !! Thank you for this great tutorial

@RyckmanApps Жыл бұрын

Awesome quick video!

@matthew_berman Жыл бұрын

Thanks :)

@williammixson2541 Жыл бұрын

My last computer was a gaming rig. My newest build this week will be specifically for ML and I cannot wait!!! Easy sub.

@thelalomorales Жыл бұрын

dude, totally just did it! YOU RULE

@Tenly2009 Жыл бұрын

It would be a lot easier for us to follow along and be successful if you did these demos starting with a brand new machine with just python and conda pre-installed. That way our experience would be more likely to match the one in your video *exactly* and we wouldn’t struggle at the points where you say “the first time I tried this, I got an error” or “I already have this installed”. Just a suggestion.

@wendellvonemet7443 9 ай бұрын

When you cut out all the dead space, your sentences run together without the natural pause that would allow beginners to digest each new concept before being bombarded with the next five new concepts that are rattled off at the speed of light. Tutorials work best when the newbies have time to let new concepts sink in. I'll be stuck trying to wrap my head around what you just said, and I continually have to pause and rewind to catch what you said while I was still chewing on the first bite. You also run your words together, within the sentences, so I have to continually rewind to make sure that I heard you correctly. Many of us are complete newbs to all of this. The info you provide is great. I watch a ton of your videos. I just wish you'd go a hair slower and dumb it down for those of us who are brand new and have to look up the definition of each piece of new tech jargon used (had to ask AI what the hell a bento was, and it thought I was interested in Japanese cuisine).

@8eck Жыл бұрын

Will be waiting for their fine-tuning feature. Should be interesting.

@Garfield_Minecraft Ай бұрын

"tell me a joke or I will tell you a joke" damn this AI is crazy

@vinylrebellion Жыл бұрын

Looking forward to the mosaic 33b. Loving the videos

@matthew_berman Жыл бұрын

Literally testing it right now!

@tiredlocke Жыл бұрын

This is awesome. I've played with some different open-source models in Runpod(which is great, btw). And I looked into installing the Text Generator WebUI locally... but I don't have a suitable GPU yet. Ultimately, I want a self-hosted (preferably in a container) API that can run various models and hit from a web browser, or from a console app, or from a game. This looks like exactly what I want. Now I just need to find a GPU to toss into my server...

@Trahloc Жыл бұрын

Oobabooga's webui-text-generator is compatible with ggml models which are CPU only but can use gpu for speedup although latest versions don't use my gou for some reason.

@tiredlocke Жыл бұрын

@@Trahloc Good to know. I previously tried some stuff that wouldn't run without the Nvidia GPU. I'll have to give this a try and see how it works.

@ehsanrt Жыл бұрын

i was looking for something like this for 2 weeks, thank you for your video .. made my learning much easier ... please make a langchain video too

@MeinDeutschkurs Жыл бұрын

I‘m excited! Yeah! I‘m interested in custom/not listed models, also NLLB-200… And what about Mac? There is no xformers available.

@khandakerrahin1003 Жыл бұрын

Are these models running locally? If yes, what are the hardware requirements?

@matthew_berman Жыл бұрын

Yes. It depends on the model. Smaller models have very little requirements.

@khandakerrahin1003 Жыл бұрын

@@matthew_berman thank you so much Sir.

@matthew_berman Жыл бұрын

@@khandakerrahin1003 you got it!

@henkhbit5748 Жыл бұрын

Wow, that is really simple. Thanks for showing this api tool for LLM 👍

@matthew_berman Жыл бұрын

@@JohnSmith-jc7yi no way. You can run local models on much smaller machines

@antonioveloy9107 Жыл бұрын

I prefer the Oobabooba Web UI, which basically runs an API locally and has a nice button to "import" any hugging face model.. But this is interesting too

@nikdog419 Жыл бұрын

I'm gonna need a cardboard box server again. Time to start a 24/7 AI stream. 😂

@musikdoktor Жыл бұрын

Great youtuber. Regards from Uruguay!

@sevenkashtan Жыл бұрын

Just adding a positive comment for the algorithm! Great video

@matthew_berman Жыл бұрын

Haha thank you!

@jmanhype1 Жыл бұрын

Please explain if this is hosted locally as a server or if we need runpod or chainlit

@matthew_berman Жыл бұрын

You can run this locally AND/OR deploy it to the cloud when you're ready for production.

@jmanhype1 Жыл бұрын

@@matthew_berman please go over steps to host for production

@scitechtalktv9742 Жыл бұрын

I also would like to know how to deploy this to the cloud. And what alternatives there are for doing that. Does HuggingFace have a cloud solution (for free)?

@grimtagnbag Жыл бұрын

Ty for all these videos getting tons of ideas

@Boboche Жыл бұрын

Love this channel

@matthew_berman Жыл бұрын

Thank you :)

@brianv2871 Жыл бұрын

Thanks for the video. This is getting close to something i'm looking for, but this still requires a permanent system set up with some decent hardware. Would be interesting to see this combined into a single google colab that could be run as needed, for those of us looking to utilize this on an occasional basis.

@doords Жыл бұрын

Colab would be very useful. I wish we can keep a colab running forever.

@joseberlines-l4f Жыл бұрын

Really dreaming about the moment this can be used to ask my own set of documents like in your previous videos about gpt4all

@build.aiagents Жыл бұрын

Still not sure how building models work in the examples, i see you using the models but how do we build on top of them? Sorry if I missed it.

@faisalalqasim Жыл бұрын

شكرًا

@GamingDaveUK Жыл бұрын

what is the advantage of this over textgen webui? and does it handle custom models as well as textgen webui (4bit gptq models etc)

@erick2will Жыл бұрын

Awesome! Thanks for sharing! 😀

@NguyenHoangHuy Жыл бұрын

Are you using WSL? Would you recommend using Windows over Linux? I've had problems trying to install all the Nvidia GPU drivers and CUDA and pytorch modules... using Ubuntu, to the point I had to reinstall Ubuntu.

@soubinan Жыл бұрын

very similar to localai seems the difference is the localai is compatible with the openai api

@GlobalOffense 6 ай бұрын

What is the beginning transition? That is epic looking.

@zepto5945 3 ай бұрын

4:04 It started rambling like a mad man 😭

@8eck Жыл бұрын

It is like personal computers in era of Steve Jobs, when they were still not so available to anyone. I guess soon this will become even more open with projects like that.

@godned74 Жыл бұрын

You are a freakin genius.😎

@Heynmffc Жыл бұрын

“ I don’t know what any of that means but doesn’t seem to be causing any problems“ amen lol

@michaelberg7201 Жыл бұрын

Super interesting and exciting project. I didn't quite get though if the models are running locally? I thought this required a lot of GPU power.

@BlackDragonBE Жыл бұрын

You can run LLMs locally on the the CPU, GPU or shared between the CPU and GPU. CPU only is quite slow though.

@faff Жыл бұрын

He's running a $2000 video card.

@RodrigoRecio Жыл бұрын

Great content. Thank you!

@nannan3347 Жыл бұрын

A feature on my wish list is being able to GET and POST the context so it can be edited on the fly

@RixtronixLAB Жыл бұрын

Nice video, well done ,thanks :)

@fabianaltendorfer11 Жыл бұрын

Awesome, thank you for this vid

@bowenchen4908 Жыл бұрын

How fast if this is running locally? Is speed going to be an issue?

@clray123 Жыл бұрын

Uhh so a wrapper over a wrapper (HuggingFace/LangChain)??? What does this new API add exactly (except for new bugs)?

@hermysstory8333 Жыл бұрын

Many thanks!!!

@patrickwalchshofer4004 Жыл бұрын

Hey Maathew - this is really great! So with this I can replace the OpenAI api and run all the apps that are built to use OpenAI?

@MarkDemarest Жыл бұрын

AMAZING!!! 💪🔥🙌

@marcoamato8461 Жыл бұрын

Maybe a silly question but what the minimum hardware requirements?

@user-wr4yl7tx3w Жыл бұрын

Is conda installation more stable than pip? Just wondering which one to use. mostly, I have used pip previously.

@aliakbari8900 3 ай бұрын

I want to create a custom chatbot that utilizes multiple Gemini and GPT APIs. Does an API remember the history of messages in a chat? This is crucial for maintaining context within the conversation.

@xavierf2229 Жыл бұрын

I thing you should show what are these LLMs are really capable of,the examples you are showing are pretty simple

@8eck Жыл бұрын

Interesting, would be cool to have response streaming feature.

@s0ckpupp3t Жыл бұрын

probably can through the gRPC interface

@8eck Жыл бұрын

@@s0ckpupp3t yes, but they depend on another project, which doesn't support it 😕

@8eck Жыл бұрын

@@s0ckpupp3t at least yet

@khorLDW 7 ай бұрын

Just wanted to point out for anyone trying, if you do this on Windows and wanted to install directly without conda, you'll get an error pointing to vllm library pointing that it can only be used on Linux.

@revenger211 Жыл бұрын

I am facing issues with the "openllm start opt", I get an error of "KeyError: 'OPENLLM_OPT_MODEL_ID'" why is that? I searched online and still can't find a solution

@gavinray241 10 ай бұрын

Why did you go through the process of creating a Conda env when you then install with Pip?

@gnosisdg8497 Жыл бұрын

well if and when the make the training section available and langchain then it will be really cool project to have !!!

@pipoviola Жыл бұрын

You are tooooo awesome!!!

@hrishabhg Жыл бұрын

It is superb knowledge. As a sequence, can you create a video, which can help user decide to choose GPU & CPU Configuration for serving.

@justin9494 Жыл бұрын

Please help. I have cuda and torch all working, but when running the model, it says Cuda not found or something. Any ideas?

@SillyProphecies Жыл бұрын

Awesome! Great stuff and thank you very much! Do you have an idea how to implement a qlora finetuned Falcon Modell?

@akshatkant1423 10 ай бұрын

Will there be input/output token limits when building custom llm models using openllm like we see in other monetized llm api models?

@heliosobsidian Жыл бұрын

Wonderful Content!! will this more easier to work with AutoGen? 🤔

@yacahumax1431 4 ай бұрын

Very nice

@javiergimenezmoya Жыл бұрын

Is it possible to link an own finetunned LLM stored in your local machine?

@BlayneOliver Жыл бұрын

Sorry for my noob question, but could someone explain why we’d need more than ChatGPT 4?

@ALFTHADRADDAD Жыл бұрын

Absolutely crazy

@uuuuu4858 8 ай бұрын

hey for me when i try to import openllm in python it shows me the module dosnt exist. any suggestions

@LUDOVICOPAPALIA Жыл бұрын

I want to run the model on runpid and create some API to run a service (python) from my personal computer. Any idea on how to do that?

@victordanneygarciaplaza2374 Жыл бұрын

Hi Matthew, thanks for this video! I have a question about how to use open-llm and have documents as a knowledge base.

@forexhunter2040 Жыл бұрын

Does using falcon model improve the accuracy more than the opt one?

@ronaldkodras4527 Жыл бұрын

It says I have no GpU available to run the falcon model. I have NVIDIA drivers down loaded but still no luck. What can I do? How about GPU from runpod?

@chrisBruner Жыл бұрын

If you've got models downloaded, can they be used?

@clear_lake Жыл бұрын

Which server configuration do you reccomend if I wanna run falcon?

@PeacefulislamicLofi 8 ай бұрын

when I installe openllm the installation processing started but was not complete I tried it 3 times but same result I get, what kind of mistake I don't know can you help me?

@DeepKarmakar-i7v 6 ай бұрын

can i use the same in a javascript application ??

@originalsuperheroguruji Жыл бұрын

Any Idea server configuration needed to use this AI models on custom servers of AWS or Linode ???

@jcfnetwork6768 Жыл бұрын

Finally!!

@matthew_berman Жыл бұрын

Wooo

@ErnestGWilsonII Жыл бұрын

How can we run an LLM at Home and have the same API that open API uses

@cavlab 5 ай бұрын

What is the minimum GPU requirement to use this

@averaguilar Жыл бұрын

Awesome!, I just want to know what model is rather good for spanish language. I have tried some and are just awful.

@mijanurrahaman3778 Жыл бұрын

Can we provide a customized knowledge base to the system?

@ThobaniMngoma Жыл бұрын

Does this API also work when running LLMS using CPU resources?

@MuhammadHadiHadi-w1r Жыл бұрын

Does it support autogen or crewAI?

@ZakkFromSource Жыл бұрын

Do you know of any current services where you could host something like this on the cloud for free to test out creating something like a chat bot that you could call and add extra functionality to, via python code running locally on your machine?

@s0ckpupp3t Жыл бұрын

does it have a streaming api endpoint?

@Kulbaru 7 ай бұрын

I am getting this error, and cant find any solutions to fix the dependency error: "Failed to build ninja ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (ninja)"

@user-wr4yl7tx3w Жыл бұрын

Is this a replacement for TextGen WebUI? do they perform the same function?

@doords Жыл бұрын

Yeah same function but all of the textgen webuis cost money. If you build an app for many people you will have to pay a lot every time your users send a query

@Azcraz Жыл бұрын

Has anyone been able to get this working recently? I follow the docs to a 'T' and the opt model is unable to start-up. I ran openllm models --show-available and it looks like it's not properly downloading the model locally after running 'pip install "openllm[opt]" as it says 'No local models available". Do we need to download the models with 'openllm import ...'? I've tried that as well with 'openllm import opt facebook/opt-1.3b" as well to no avail. Surely I must be doing somehting silly!? Any help appreciated!

@Azcraz Жыл бұрын

Got it working! Turns out you need to manually import the models, running the pip install openllm[] does not download the model. You must use the import command and specify the model, in my case I also needed to pass the -serialisation legacy flag because the model did not support safetensors!

@thehkmalhotra9714 Жыл бұрын

I loved your content mate ❤️ Thanks for your video. Just a quick question can we use localhost:3000 to a domain. This localhost url can be used as an API till I am running in my PC what if I want to point this url to a domain name which can be easily accessible to all? Will be waiting for your answer 😥 Keep up the great work dude ❤

@ganeshkgp Жыл бұрын

Is there any free hosting so i can host and test it and also how to use domain instead of localhost?

@ganeshkgp Жыл бұрын

Please dont get me wrong i am a software developer but i have no idea how to use llms.

@Star-rd9eg Жыл бұрын

How would i use this in runpod ? :)

@avi7278 Жыл бұрын

Now all you need is $10,000 computer! No but seriously the last piece of the puzzle here is a service like runpod where you can install this and it charges you for exact inference time for each request. Does anybody know of anything like it?

@elchippe Жыл бұрын

I think the 3B and 7B parameters versions models can run locally with a CPU or even a 12GB 3060 RTX.

@clray123 Жыл бұрын

No, the last piece of the puzzle is open source models that aren't crap.

@elchippe Жыл бұрын

@@clray123 The easy fine-tuning of these models for specific tasks and the algorithmic optimization to run these models more efficiently in a spectrum of hardware from low to the high end is what is going to make the difference against propetary models.

@clray123 Жыл бұрын

@@elchippe LoRA fine-tuning is like a 200 lines script of Python code. You clone the script from Git and run it. The difficulty of fine tuning is not because you lack some silly API, but rather the choice of parameters and foremost of the input data. And you will not be able to finetune any serious models on "low end" hardware, even with (Q)LoRA and what not.

@avi7278 Жыл бұрын

@@clray123 Yeah well I meant this particular puzzle of being able host your own personal API of an open source model. Model quality is beside the point.

@eyemazed Жыл бұрын

does API support the embedding functionality?

@cheifei Жыл бұрын

Embeddings are just custom text that is passed to the LLM to use as a reference. To get the embeddings: You would need to run a model that can specifically convert text to vectors. Then send you embedded docs to that embeddings model via the API. Take the vector response and store it in a vector store. Then when you make a query, convert you query to a vector via your local model, then perform a similarity search on your vector store. That will return some docs, and you pass the text of those docs to the LLM.

@eyemazed Жыл бұрын

@@cheifei are you implying it's absolutely irrelevant how you create the embeddings? don't different models use different ambedding algorithms, that's why they have different vector dimensionalities among other things?

@cheifei Жыл бұрын

@@eyemazed no, I am not implying that. I agree with you that you have to use the same embedding model for consistency. I think the missing piece is that that you pass the text of the query and the text (not vectors) of the embedded docs to the LLM.

@eyemazed Жыл бұрын

@@cheifei i see, i thought you needed to use the same embedding API for vectorizing the context that you pass along with prompt to the LLM as the LLM uses to vectorize your prompt. so if I understand correctly you're free too choose any embedding API/vector store that you want because it's separate from the LLM and is only used to retrieve the context that you can send along with your prompt to the LLM

@cheifei Жыл бұрын

@@eyemazed That is correct.