Function Calling Datasets, Training and Inference

Рет қаралды 11,288

Күн бұрын

Пікірлер: 55

@gdelco 9 ай бұрын

Great video!! I am interesting in the performance of models with function calling. Do you know how the performance would be if a function called model is used for customer services and the requirements (server, gpu, ram)?

@TrelisResearch 9 ай бұрын

You can get a sense from this video of how the different models perform. OpenChat and DeepSeek are pretty good. The easiest way to get a server going is probably to do what I did on this video and use runpod to run an A6000. To get cheaper, you could run on an A4000 if you quantize the openchat model with GPTQ (check out the quantization video). There are quite a few steps and I'm hoping to make a video on this fairly soon.

@gustavstressemann7817 8 ай бұрын

Very good video, I learnt a lot. I wish you a Merry Christmas and please keep up the good work.

@TrelisResearch 8 ай бұрын

Cheers, merry christmas to you too

@mamotivated 8 ай бұрын

Fantastic overview, great detail and clarity. Kudos to creating the dataset.

@befikerbiresaw9788 7 ай бұрын

It's cool using the function list column. i wanted to use that but was worried i will cram up the context window. But a vector database with function list's should be awesome. Thanks for sharing man.

@RemekKinas 8 ай бұрын

Brilliant video! Great content. Thank you very much!

@brennonwilliams9181 9 ай бұрын

I really enjoy these longer form videos. Thanks for the effort and details. Interesting observation on the model types influencing the performance of function calling. Are deepseek and openchat licenses available for commercial use?

@TrelisResearch 9 ай бұрын

You're welcome Brennon, Deepseek 67B license is here: github.com/deepseek-ai/deepseek-LLM/blob/main/LICENSE-MODEL OpenChat license is here: huggingface.co/openchat/openchat_3.5 listed as Apache 2.0 I think any license needs to be caveated that: - laws in many jurisdictions are not entirely clear on what constitutes fair use for input data. - models may - intentionally or unintentionally - include data that is the output of other language models with limited licenses.

@okj1999 9 ай бұрын

I really appreciate your videos, you make these things so accessible and easy to understand. Have you thought of making a discord server, nearly everyone around llms and their mother has a server Mistral, Qwen, Nous Research (Except for companies like Microsoft, Facebook Google). I'm asking cause there is a huge amount of people hanging around these llm focused servers looking to learn exactly what you're showing. People would be able to collaborate on dataset curation research in a better way than a KZbin format can. You could also get video suggestions or ideas also since you offer a place to sell datasets and models it would help if people had a place to collab with others for their own individual datasets. Too many places aren't exactly friendly to beginners.

@TrelisResearch 9 ай бұрын

Cheers for the comment and suggestions. I think it's worth considering - doing a Discord server. The things I'm weighing: - I have paid repos for ADVANCED-fine-tuning and ADVANCED-inference . People comment in there and I want to make sure I focus my limited time supporting them. - Personally, I get lost with all of the Discord servers around and I wouldn't want to be adding another badly managed one, especially if it pulled from my time making new vids and products. At the same time, I think your points are valid that it could help people out. So I'll keep reflecting on it.

@carrietam563 Ай бұрын

Hi! Thanks so much for this video! In the dataset, I was wondering if in the function column, you provide a list of all the available functions the model will have in each row. Why or why not?

@TrelisResearch Ай бұрын

Yes, in the function column, each row of the dataset has a list of all available functions

@VijayDChauhaan 8 ай бұрын

Can we call predefined Javascript functions? I was thinking of using Llama 2 as a chatbot which will call functions that I have already defined in my Angular app, and UI changes should occur if response from model is correct. Is this even possible with Function Calling?

@TrelisResearch 8 ай бұрын

Yes, you can. The inputs to the llm is a list of json structured metadata object describing each function. The output is a json object with the arguments you need to call your function. So, this is all language agnostic, i.e. the language model isn't specifically trained for one language or another. The ADVANCED Inference Repo referenced in the description is written in python - so if you use that, you would make some tweaks to use javascript.

@VijayDChauhaan 8 ай бұрын

@@TrelisResearchDo I need to fine tune the Llama first on my custom functions?

@TrelisResearch 8 ай бұрын

@@VijayDChauhaan probably not if you use a function calling model (see mart.trelis.com) or the free huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v3 model (although weaker) on HuggingFace. If the results aren't good enough, then you could consider fine tuning on your custom functions (and yes the fine-tuning script for that is included in the advanced fine-tuning repo).

@frazuppi4897 20 күн бұрын

amazing video, wouldn't it make sense to have constrained decoding since you have the functions definition, you know the parameters that have to come back for function calling

@TrelisResearch 19 күн бұрын

yes, makes sense to constrain if every call needs a function call. However, if you want the model to decide, then constraining won't work (although you could constrain to a function-call or a string).

@radokhlebov 8 ай бұрын

thank you so much 🙏

@TrelisResearch 8 ай бұрын

cheers, you're welcome

@augustyasharma9447 8 ай бұрын

Really helpful video I have one question what if I have already have LLM that's fine tuned using instructions tuning can I fine tune it again to add function calling?

@TrelisResearch 8 ай бұрын

Yup, just use the same instruction format

@augustyasharma9447 8 ай бұрын

@@TrelisResearch Does that mean I can do continuous learning?

@TrelisResearch 8 ай бұрын

In principle you could but most libraries don’t support inference while learning. Although there must be a project out there doing that and that has looked at effectiveness.

@raestrada95 9 ай бұрын

What a great video, everything was very well explained. I wonder how much the result of function calling will improve if examples in Spanish are added to the training dataset?

@TrelisResearch 9 ай бұрын

It's a bit hard to say. The openchat and deepseek models already do well in spanish. It's just the advanced case of chaining functions that has challenges. Possibly adding spanish examples would help, but the chaining ability is perhaps more the model's underlying capability, so a stronger spanish model may be required.

@konstantinlozev2272 9 ай бұрын

Fab!

@B-ix9lo 6 ай бұрын

Thanks for uploading this video. I have a question about the dataset. What is the percentage of questions with function call and questions without function call? And why did you create data with that percentage?

@TrelisResearch 6 ай бұрын

I like that question! Over half, maybe 60% have no function call. I suppose my thinking is that calls without a function call can't really do much damage (unless there really should have been a call). I'm also trying to get the model to be quite targeted in when it makes the function call. Probably there should be a lot more cases where there is no function call than where there is. I didn't over think it though. Probably 20-80% would have worked. 10% might not though because it's a small dataset.

@soc06202 4 ай бұрын

Thank you for the good video! :) I understand that the LIMA methodology is full-parameter fine-tuning. In this case, if fine-tuning is performed on the function-calling dataset, it is likely that function-calling is incorrectly called when given a general prompt that does not require function-calling. Is this the case in practice? If so, what is the solution?

@TrelisResearch 3 ай бұрын

Actually, for certain fine-tuned models - like the openchat_3.5 model, it is very good at correctly distinguishing when to call a function or not. Note that the dataset includes rows that do not require function calling.

@MW-ez1mw 6 ай бұрын

@TrelisResearch thank you for the great content, I wonder if padding on the right for function calling mandatory, since my understanding is, for decoder model, we usually pad on the left. Any unexpected behavior might caused by padding on the right? Thanks

@TrelisResearch 6 ай бұрын

Hmm. Yeah I could try it again but as far as I recall both resulted in the same results. Thanks

@shubhashish7090 6 ай бұрын

What do you think the performance of Mixtral-8x7B can be for function calling? Will it be better than DeepSeek, given the comparable model parameter count and overall better performance of Mixtral-8x7B in normal tasks (logical ones)?

@TrelisResearch 6 ай бұрын

Both are good for function calling but DeepSeek Coder and DeepSeek 70B are a little better on function calling

@varunmehra5 4 ай бұрын

Can I use this method to train gpt 3.5 turbo model to train it to better understand and use the parameters values for my function calling where it's calling an API

@TrelisResearch 4 ай бұрын

Yes, at a high level, you can use this same approach.

@user-xd1ic9qk8d 6 ай бұрын

the role system doesn't show in the prompt using llama2 function calling

@TrelisResearch 6 ай бұрын

correct! you can optionally add it back in, but I leave it out if not being used.

@PunitPandey 8 ай бұрын

@TrelisResearch how this training medhod different from autotrain-advanced? Do we get same quantity results?

@TrelisResearch 8 ай бұрын

Training quality depends on the dataset quality and also attention and loss mask setup. I’m not too sure how that trainer is set up. It may be worth looking at my earlier function calling video to understand the loss mask and attention mask. I know that the huggingface TRL trainer does have an option to mask the prompt loss. I have used a custom setup.

@PunitPandey 8 ай бұрын

@@TrelisResearch Thanks.

@yiouyou 9 ай бұрын

How long will it take to tain a 34B model? Do you offer GPTQ script?

@TrelisResearch 9 ай бұрын

It depends on what you're training it for. Anywhere from 15 mins to 6 hours on an A100. There's no GPTQ script now, but if you have purchased the model and comment on the HuggingFace repo, then I can make one for you.

@user-yu8sp2np2x 8 ай бұрын

What is the meaning of use_flash_attention_2=True at 17:00 ?

@TrelisResearch 8 ай бұрын

on newer GPUs it speeds up training time (and time to first token in inference) by optimising attention computations. If your GPU doesn't support it, you'll see an error and can comment it out. P.S. the usage has recently been updated (as of early Jan 2024) so flash attention should be specified with: attn_implementation="flash_attention_2"

@user-yu8sp2np2x 8 ай бұрын

@@TrelisResearch Thanks a lot!

@StevenPack-nh9ns 3 ай бұрын

which branch has code in This video? thks

@TrelisResearch 3 ай бұрын

Howdy it's in the function-calling branch of the ADVANCED fine-tuning repo

@user-wf6xt6hq9l 8 ай бұрын

SUCH A SCAM FOR NEW COMMERS. WHY WOULD ANY ONE PAY FOR THE MODEL THAT YOU FINED TUNED?????????????

@TrelisResearch 8 ай бұрын

The Llama 2 function calling fine-tuned model is free on HuggingFace Other models are more performant, and I’ve given the steps in this video for fine tuning one’s self, as well as the option to just buy and save time - which many do and are pleased with.

@radokhlebov 8 ай бұрын

video production, research require effort so people don't understand that information and time costs money

@tweenty8th 6 ай бұрын

user-wf6xt6hq9l you are definitely not a new commer by your answer

@alchemication 4 ай бұрын

Actually I did pay as I want to reward Trelis for such great work and knowledge to the community.