3 Effective steps to Reduce GPT-4 API Costs

3 Effective steps to Reduce GPT-4 API Costs - FrugalGPT

Рет қаралды 19,296

Күн бұрын

Пікірлер: 34

@SamuelJunghenn Жыл бұрын

Their logic missed one critical point when calculating the cost, if you are serving 15,000 people a month with customer service your wages are much much higher than $21,000 even if you hire people in developing countries. Carry on, your content is awesome 😎

@1littlecoder Жыл бұрын

Thanks for sharing this!

@hrutuselar8839 9 ай бұрын

I wanna be this guy, reading between the lines

@miinyoo 8 ай бұрын

This is a great model of how to separate out expensive queries from common ones. Like a local FAQ. Really cool idea. I can see a real use case for this when SORA hits the actual market because when it does, it's going to be very expensive. Get the heavy lifting stuff done in short as possible bursts and then use a lighter model based on SORA locally to extrapolate on what SORA responded with to fill in the gaps. Example. You only need one response for any given scene say a master shot, maybe a couple. Wide angle and complete. Then outside of SORA, use those responses, knowledge of the "protocols" and put that data through a light weight Open Source model with cache and some kind of convolution to generate the closeups, cutaways, alternate takes, etc. Now, SORA isn't there yet by any means to be production ready but it will get there eventually up to and including full dialog, personality archetypes, everything in the box.

@samgomarezsmith3887 Жыл бұрын

Very good method for using gpt4 in combinaison of cheeper models

@1littlecoder Жыл бұрын

Thanks

@marcosmagana8930 6 ай бұрын

Fantastic content! Could you please make a tutorial with an actual code implementation of FrugalGPT? Thanks! I truly appreciate it.

@seikatsu_ki Жыл бұрын

Thank you for your content. However, the sound of typing on Typewriter was distracting.

@1littlecoder Жыл бұрын

Thank you for the feedback. I just tried something different. I'll improve it.

@tarun4705 Жыл бұрын

But I think as per the OpenAI policy, we cannot use its responses to finetune another model. So, I am not sure if we can follow option - d) Model fine-tuning for commercial usage. Btw, In option 5 how is the score calculated to check whether the answer is correct or not since we can't rely on open-source LLMs to calculate the score, we might need GPT-3 or GPT-4 to calculate the score right. In that case instead of making an API call to GPT-4 to calculate the score why not send the query itself.

@sto2779 3 ай бұрын

6:46 - The work I do (c/c++ software engineering) requires 100% GPT-4 responses in providing 98% bug-free code and its really difficult to comprehend the code questions asked, also more difficult when asking to review the code. Redirecting some parts of my question means the quality of the answers won't be good when it was answered by other GPT models. I think the only way to reduce costs is that the user needs to test which GPT platform is really good at and have to send easier questions to specific GPT models based on ranking. The only issue is how can such a complex question be split into smaller questions and asked to different models in which the results of all the responses are later combined. I tested many GPT models, there not great, even for simple code questions. GPT 3.5 is also not that great, however GPT4 is really great, not sure how this paper is valid.

@boscoraj6148 2 ай бұрын

Wow thanks man!

@surajitchakraborty1903 Жыл бұрын

Hey , Great Video. With regards to Prompt Adaptation, is the library Promptify be able to be used in someway ?

@fire17102 Жыл бұрын

Thanks for the video! Really interesting! Just Please don't use the typing sound effects, theyre really, really annoying

@1littlecoder Жыл бұрын

Thank you for the feedback. Another sub mentioned the same. Sorry about that. I'll avoid it

@1littlecoder Жыл бұрын

Btw, that sound itself is annoying or it's annoying because it's overused?

@fire17102 Жыл бұрын

@@1littlecoder thanks for the quick reply man! Love your channel Well, it's especially hard since it is not even synced to the text, it keeps playing until the text disappears, and it's as loud as your voice. I would prefer 0 bells and wisels. If you do it again, do it very short and very quite. But again no nonsense is better imo Good luck and all the best!

@1littlecoder Жыл бұрын

@@fire17102 Thanks so much. I appreciate the detailed feedback.

@anuragmishra-yu2yx Жыл бұрын

I have hard time understanding how they implement the "scoring function" to decide the quality of generated response for the model? and how they decide the threshold to neglect the model's response. If anyone could help me to understand that

@klammer75 Жыл бұрын

Great work and I love this idea! Unique integrations and novel recombinations of existing models and tools is definitely the way forward! Good stuff!🥳🦾🤩

@1littlecoder Жыл бұрын

Thanks for sharing :)

@sammathew535 Жыл бұрын

"The cache idea is common sense in Software Engineering" - often a challenge to implement though. I guess in this case the implementation would be along the lines of Vector DBs.

@tharzzan Жыл бұрын

I'm not sure i follow the logic of this paper. Would you be kind enough to demonstrate how this whole concept can be applied in a real world use case?

@Naaoos1 Жыл бұрын

How to calculate score of your answer ? and with reference to what?

@conceptsintamil Жыл бұрын

But how does model cascade works on passing the prior contexts to the last most expensive model without compromising on the token reduction? Wouldn’t GPT4 need previous contexts to send accurate response? In that case wouldn’t the token size increase ?

@ThomasTomiczek Жыл бұрын

If that is the case, you can maybe not send it. But often you do not need a lot of context to answer the question.

@vekRaft 3 ай бұрын

Too many keys typed for so less words

@DistortedV12 Жыл бұрын

hmm what the heck is prompt selection?

@1littlecoder Жыл бұрын

Maybe I didn't do a good job, please read the paper it has examples.

@emmanuelkolawole6720 Жыл бұрын

What we need is local cpp for llm "HuggingFaceH4/starchat-alpha A". It is much better than any other open source chat models at coding. And it does very well at chatting too. This is better than Vicuna.