InternLM-2.5 (7b) : This NEW Model BEATS Qwen-2 & Llama-3 in Benchmarks! (Fully Tested)

Рет қаралды 3,748

Күн бұрын

In this Video, I'll be telling you about the newly released InternLM-2.5 7b Model. This new model comes with a 1M Token Context Limit which is really amazing. This new Model claims to beat Qwen-2, Llama-3, Claude, DeepSeek and other Opensource LLMs. I'll be testing it out in this video. Watch the video to find more about this new model. It also beats Qwen-2, DeepSeek Coder, Codestral in all kinds of coding tasks.
------
Key Takeaways:
🌟 InternLM 2.5 Launch: Just launched, InternLM 2.5 is the latest AI model, outperforming Llama 3 and Gemma 2 9B in practical scenarios.
🚀 7 Billion Parameters: With 7 billion parameters, InternLM 2.5 offers outstanding reasoning capabilities and a long context window, perfect for complex AI tasks.
🏆 Benchmark Dominance: InternLM 2.5 excels in MMLU, CMMLU, BBH, and MATH benchmarks, showcasing superior performance against larger models.
🔧 Tool Usage: InternLM 2.5 excels at tool usage, making it ideal for applications that involve web search and other integrated tools.
📊 Real-World Performance: Despite benchmark success, real-world performance is where InternLM 2.5 shines, particularly in coding tasks with its 1M-long context window.
💻 Available on Major Platforms: Now accessible on Ollama, HuggingFace, and more, making it easy to test and integrate InternLM 2.5 into your projects.
🤖 Hands-On Testing: Watch as we put InternLM 2.5 through various language and coding tasks, highlighting its strengths and weaknesses.
------
Timestamps:
00:00 - Introduction
00:07 - About InternLM-2.5 (7b with 1Million Context)
01:16 - Benchmarks
03:03 - Testing
07:53 - Conclusion

Пікірлер: 30

@user-no4nv7io3r 12 күн бұрын

They train their models on benchmarking, claim to beat everyone else, turned out to be trash in most cases, what a crazy world we are living in

@superakaike 12 күн бұрын

They also train their model on ChatGPT answers ...

@wolraikoc 12 күн бұрын

A copilot video with this model and neovim would be aweseome!

@Link-channel 11 күн бұрын

I wonder how to integrate autoconpletion in vim, no wait, I wonder how to use vim

@nahuelpiguillem2949 12 күн бұрын

Thank you for doing honest review, it's rare to find someone saying "i tested and it is not worth it". Sometimes the last thing it isn't the best

@BadreddineMoon 12 күн бұрын

I'm addicted to your videos, keep up the good work ❤

@user-no4nv7io3r 12 күн бұрын

@@BadreddineMoon me too especially his voice and tone and critiques that's magical

@sammcj2000 12 күн бұрын

I’d be interested in you trying it with coding with a number of different parameters (topp/k, temp, rep penalty etc)

@Revontur 12 күн бұрын

as always a great video... thanks for your effort. Is there any site, where you publish your tests ? because it would be really great to compare new models with previous tested models.

@nahuelpiguillem2949 12 күн бұрын

Sameeee

@waveboardoli2 12 күн бұрын

Can you show how to use claude-engineer with opensource models?

@RedOkamiDev 12 күн бұрын

Thanks Mr. AiKing, you are my daily source of AI news :)

@paulyflynn 12 күн бұрын

What size codebase will 1M Token Context support? Is there a LOC to Token formula?

@pudochu 12 күн бұрын

6:47 How can I find the test here? It would also be great if they have answers.

@SpikyRoss 12 күн бұрын

Hey, It would be great if you could add the links to the model in the description. 👍

@jaysonp9426 11 күн бұрын

You didn't test the needle in a haystack or what it does with 1m tokens?

@tianjin8208 11 күн бұрын

Intern series always train their models on eval dataset, it's their style, they need to surpass others quikly, so this is the fast way.

@LucasMiranda2711 12 күн бұрын

Which one was the best tested until now? Any place or anyone counting the scores?

@AICodeKing 12 күн бұрын

Currently, Qwen-2 is topping my list for general tasks and for coding DeepSeek-Coder-V2

@Richi-8 12 күн бұрын

Which one do you consider to be the best model for general tasks nowadays?

@AICodeKing 12 күн бұрын

Qwen2

@EladBarness 12 күн бұрын

Hype for nothing, wouldn’t count on it in anything… thank you for the video!

@elchippe 11 күн бұрын

Draw a butterfly in svg? Those task would be hard for a large LLM like claude and way more for an 7B LLM. The transformer architecture biggest drawback is inability to rethink backwards, that is why this models mostly fail in these puzzles.

@AICodeKing 11 күн бұрын

I generally do that test to check wheither the LLM can create something similar. Claude & GPT can do this. Also, I don't do other tests for smaller ones the tests are similar wheither it be 7b or 300b

@MeinDeutschkurs 12 күн бұрын

The model seems to be horrendous! Thx for saving my time.

@john_blues 11 күн бұрын

If it can't build a basic python script, why would I want it chatting with my codebase? Anyway, thanks for the video and the actual testing on this.

@aryindra2931 12 күн бұрын

Please make 2 day❤❤, i like video

@Lemure_Noah 9 күн бұрын

This model is good in benchmarks, but it doesn't seem to be better than other moderm models like Llama-3, Phi-3 or even Mistral 7, at least on my internal review, dealing with summarization and other language tasks. If someone could give real word example where it performs better than other models on same class, please share it ;)

@LazarMateev 6 күн бұрын

Merge maestro with claude engineer and aider into 1. Make it is open source model orchestration recalling initial prompt with ascces to RAG and you would be the king of the kings 😊 locally hosted web apps looks very cool niche