Great video!! I am interesting in the performance of models with function calling. Do you know how the performance would be if a function called model is used for customer services and the requirements (server, gpu, ram)?
@TrelisResearch9 ай бұрын
You can get a sense from this video of how the different models perform. OpenChat and DeepSeek are pretty good. The easiest way to get a server going is probably to do what I did on this video and use runpod to run an A6000. To get cheaper, you could run on an A4000 if you quantize the openchat model with GPTQ (check out the quantization video). There are quite a few steps and I'm hoping to make a video on this fairly soon.
@gustavstressemann78178 ай бұрын
Very good video, I learnt a lot. I wish you a Merry Christmas and please keep up the good work.
@TrelisResearch8 ай бұрын
Cheers, merry christmas to you too
@mamotivated8 ай бұрын
Fantastic overview, great detail and clarity. Kudos to creating the dataset.
@befikerbiresaw97887 ай бұрын
It's cool using the function list column. i wanted to use that but was worried i will cram up the context window. But a vector database with function list's should be awesome. Thanks for sharing man.
@RemekKinas8 ай бұрын
Brilliant video! Great content. Thank you very much!
@brennonwilliams91819 ай бұрын
I really enjoy these longer form videos. Thanks for the effort and details. Interesting observation on the model types influencing the performance of function calling. Are deepseek and openchat licenses available for commercial use?
@TrelisResearch9 ай бұрын
You're welcome Brennon, Deepseek 67B license is here: github.com/deepseek-ai/deepseek-LLM/blob/main/LICENSE-MODEL OpenChat license is here: huggingface.co/openchat/openchat_3.5 listed as Apache 2.0 I think any license needs to be caveated that: - laws in many jurisdictions are not entirely clear on what constitutes fair use for input data. - models may - intentionally or unintentionally - include data that is the output of other language models with limited licenses.
@okj19999 ай бұрын
I really appreciate your videos, you make these things so accessible and easy to understand. Have you thought of making a discord server, nearly everyone around llms and their mother has a server Mistral, Qwen, Nous Research (Except for companies like Microsoft, Facebook Google). I'm asking cause there is a huge amount of people hanging around these llm focused servers looking to learn exactly what you're showing. People would be able to collaborate on dataset curation research in a better way than a KZbin format can. You could also get video suggestions or ideas also since you offer a place to sell datasets and models it would help if people had a place to collab with others for their own individual datasets. Too many places aren't exactly friendly to beginners.
@TrelisResearch9 ай бұрын
Cheers for the comment and suggestions. I think it's worth considering - doing a Discord server. The things I'm weighing: - I have paid repos for ADVANCED-fine-tuning and ADVANCED-inference . People comment in there and I want to make sure I focus my limited time supporting them. - Personally, I get lost with all of the Discord servers around and I wouldn't want to be adding another badly managed one, especially if it pulled from my time making new vids and products. At the same time, I think your points are valid that it could help people out. So I'll keep reflecting on it.
@carrietam563Ай бұрын
Hi! Thanks so much for this video! In the dataset, I was wondering if in the function column, you provide a list of all the available functions the model will have in each row. Why or why not?
@TrelisResearchАй бұрын
Yes, in the function column, each row of the dataset has a list of all available functions
@VijayDChauhaan8 ай бұрын
Can we call predefined Javascript functions? I was thinking of using Llama 2 as a chatbot which will call functions that I have already defined in my Angular app, and UI changes should occur if response from model is correct. Is this even possible with Function Calling?
@TrelisResearch8 ай бұрын
Yes, you can. The inputs to the llm is a list of json structured metadata object describing each function. The output is a json object with the arguments you need to call your function. So, this is all language agnostic, i.e. the language model isn't specifically trained for one language or another. The ADVANCED Inference Repo referenced in the description is written in python - so if you use that, you would make some tweaks to use javascript.
@VijayDChauhaan8 ай бұрын
@@TrelisResearchDo I need to fine tune the Llama first on my custom functions?
@TrelisResearch8 ай бұрын
@@VijayDChauhaan probably not if you use a function calling model (see mart.trelis.com) or the free huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v3 model (although weaker) on HuggingFace. If the results aren't good enough, then you could consider fine tuning on your custom functions (and yes the fine-tuning script for that is included in the advanced fine-tuning repo).
@frazuppi489720 күн бұрын
amazing video, wouldn't it make sense to have constrained decoding since you have the functions definition, you know the parameters that have to come back for function calling
@TrelisResearch19 күн бұрын
yes, makes sense to constrain if every call needs a function call. However, if you want the model to decide, then constraining won't work (although you could constrain to a function-call or a string).
@radokhlebov8 ай бұрын
thank you so much 🙏
@TrelisResearch8 ай бұрын
cheers, you're welcome
@augustyasharma94478 ай бұрын
Really helpful video I have one question what if I have already have LLM that's fine tuned using instructions tuning can I fine tune it again to add function calling?
@TrelisResearch8 ай бұрын
Yup, just use the same instruction format
@augustyasharma94478 ай бұрын
@@TrelisResearch Does that mean I can do continuous learning?
@TrelisResearch8 ай бұрын
In principle you could but most libraries don’t support inference while learning. Although there must be a project out there doing that and that has looked at effectiveness.
@raestrada959 ай бұрын
What a great video, everything was very well explained. I wonder how much the result of function calling will improve if examples in Spanish are added to the training dataset?
@TrelisResearch9 ай бұрын
It's a bit hard to say. The openchat and deepseek models already do well in spanish. It's just the advanced case of chaining functions that has challenges. Possibly adding spanish examples would help, but the chaining ability is perhaps more the model's underlying capability, so a stronger spanish model may be required.
@konstantinlozev22729 ай бұрын
Fab!
@B-ix9lo6 ай бұрын
Thanks for uploading this video. I have a question about the dataset. What is the percentage of questions with function call and questions without function call? And why did you create data with that percentage?
@TrelisResearch6 ай бұрын
I like that question! Over half, maybe 60% have no function call. I suppose my thinking is that calls without a function call can't really do much damage (unless there really should have been a call). I'm also trying to get the model to be quite targeted in when it makes the function call. Probably there should be a lot more cases where there is no function call than where there is. I didn't over think it though. Probably 20-80% would have worked. 10% might not though because it's a small dataset.
@soc062024 ай бұрын
Thank you for the good video! :) I understand that the LIMA methodology is full-parameter fine-tuning. In this case, if fine-tuning is performed on the function-calling dataset, it is likely that function-calling is incorrectly called when given a general prompt that does not require function-calling. Is this the case in practice? If so, what is the solution?
@TrelisResearch3 ай бұрын
Actually, for certain fine-tuned models - like the openchat_3.5 model, it is very good at correctly distinguishing when to call a function or not. Note that the dataset includes rows that do not require function calling.
@MW-ez1mw6 ай бұрын
@TrelisResearch thank you for the great content, I wonder if padding on the right for function calling mandatory, since my understanding is, for decoder model, we usually pad on the left. Any unexpected behavior might caused by padding on the right? Thanks
@TrelisResearch6 ай бұрын
Hmm. Yeah I could try it again but as far as I recall both resulted in the same results. Thanks
@shubhashish70906 ай бұрын
What do you think the performance of Mixtral-8x7B can be for function calling? Will it be better than DeepSeek, given the comparable model parameter count and overall better performance of Mixtral-8x7B in normal tasks (logical ones)?
@TrelisResearch6 ай бұрын
Both are good for function calling but DeepSeek Coder and DeepSeek 70B are a little better on function calling
@varunmehra54 ай бұрын
Can I use this method to train gpt 3.5 turbo model to train it to better understand and use the parameters values for my function calling where it's calling an API
@TrelisResearch4 ай бұрын
Yes, at a high level, you can use this same approach.
@user-xd1ic9qk8d6 ай бұрын
the role system doesn't show in the prompt using llama2 function calling
@TrelisResearch6 ай бұрын
correct! you can optionally add it back in, but I leave it out if not being used.
@PunitPandey8 ай бұрын
@TrelisResearch how this training medhod different from autotrain-advanced? Do we get same quantity results?
@TrelisResearch8 ай бұрын
Training quality depends on the dataset quality and also attention and loss mask setup. I’m not too sure how that trainer is set up. It may be worth looking at my earlier function calling video to understand the loss mask and attention mask. I know that the huggingface TRL trainer does have an option to mask the prompt loss. I have used a custom setup.
@PunitPandey8 ай бұрын
@@TrelisResearch Thanks.
@yiouyou9 ай бұрын
How long will it take to tain a 34B model? Do you offer GPTQ script?
@TrelisResearch9 ай бұрын
It depends on what you're training it for. Anywhere from 15 mins to 6 hours on an A100. There's no GPTQ script now, but if you have purchased the model and comment on the HuggingFace repo, then I can make one for you.
@user-yu8sp2np2x8 ай бұрын
What is the meaning of use_flash_attention_2=True at 17:00 ?
@TrelisResearch8 ай бұрын
on newer GPUs it speeds up training time (and time to first token in inference) by optimising attention computations. If your GPU doesn't support it, you'll see an error and can comment it out. P.S. the usage has recently been updated (as of early Jan 2024) so flash attention should be specified with: attn_implementation="flash_attention_2"
@user-yu8sp2np2x8 ай бұрын
@@TrelisResearch Thanks a lot!
@StevenPack-nh9ns3 ай бұрын
which branch has code in This video? thks
@TrelisResearch3 ай бұрын
Howdy it's in the function-calling branch of the ADVANCED fine-tuning repo
@user-wf6xt6hq9l8 ай бұрын
SUCH A SCAM FOR NEW COMMERS. WHY WOULD ANY ONE PAY FOR THE MODEL THAT YOU FINED TUNED?????????????
@TrelisResearch8 ай бұрын
The Llama 2 function calling fine-tuned model is free on HuggingFace Other models are more performant, and I’ve given the steps in this video for fine tuning one’s self, as well as the option to just buy and save time - which many do and are pleased with.
@radokhlebov8 ай бұрын
video production, research require effort so people don't understand that information and time costs money
@tweenty8th6 ай бұрын
user-wf6xt6hq9l you are definitely not a new commer by your answer
@alchemication4 ай бұрын
Actually I did pay as I want to reward Trelis for such great work and knowledge to the community.