I always love meeting someone else who loves the science behind large language models as much as I do. 😊
@philip.t10 ай бұрын
Your channel is amazing. I feel very lucky to have found it! Hope you keep making videos. Thank you!!!
@learndatawithmark10 ай бұрын
Thanks, I'm glad you like it :)
@olivermorris420910 ай бұрын
Great comparison, I had more success using the JSON output setting and asking mistral to respond in JSON in a prescribed structure but only works about 80% of the time.
@learndatawithmark10 ай бұрын
Have you found anything that does a good job with JSON output on the open source models? I found functions worked well with OAI and I've been meaning to try out Gorilla functions, but didn't try yet - gorilla.cs.berkeley.edu/index.html
@alizhadigerov959910 ай бұрын
had you tried the finetuned version of mixtral? Dolphin
@learndatawithmark10 ай бұрын
I haven't - Have you? Is it good? I'm never entirely sure what would be good things to try out on these models that are fine tuned on top of the foundational models
@alizhadigerov959910 ай бұрын
@@learndatawithmark I had tried, yes. Me neither, lol. Since I don't have a custom test set to test the performance of these models and all I can do is check the performance on common benchmarks. Quick chitchating showed that it has a good resoning capabilities, since it was able to answer properly the following questions: 1. If there's a sandwich on the ground and there is a plate on top of the sandwich and there are two boxes on top of the plate. How do I retrieve the sandwich? 2. I have two bananas on my hand. I bought 5 apples yesterday and 3 peaches two days ago. If I ate 2 peaches and 1 apple, how many bananas do I have in total? and few more questions... Some models which I've tried (MPT, Alpaca the very first versions), were not be able to answer properly. The author claims that it is `really good at coding`, though I don't use LLMs for coding.
@alizhadigerov959910 ай бұрын
@@learndatawithmark I think it is worth creating a custom dataset for testing these models, because I don't like measuring the "goodness" of the models based on their performance on the popular benchmarks.
@learndatawithmark10 ай бұрын
@@alizhadigerov9599 I suppose that might be an opportunity to try out one of the many LLM evaluation tools that keep popping up!
@alizhadigerov959910 ай бұрын
@@learndatawithmark Can you recommend one of those? Or maybe you consider making a video about custom evaluation? I guess, it will be very interesting. Thanks for what you are doing!