Finally a Competitor! Deepseek r1 vs OpenAI o1| Battle of the Best LLMs

Рет қаралды 4,524

Күн бұрын

Пікірлер: 64

@DLBrawlStars 11 күн бұрын

I was waiting for your comparison, I’m very glad to see your questions, continue to look for new opportunities, niches and, as a result, new questions for testing.

@YJxAI 11 күн бұрын

I glad you like it thanks :)

@michaelspoden1694 10 күн бұрын

@@YJxAIa lot of people are saying you can't use search and R1 at the same time but I found that to be not true you can use it! It's amazing what you can do I actually made it compare the most current benchmarks of state-of-the-art models against each other which was a more complex prompt but search 56 sites and thought for quite a while making a beautiful Benchmark comparison compared to anything I find online!!!!

@YJxAI 9 күн бұрын

I think should test it . I saw even minimax has web search.

@atpray 5 күн бұрын

Your channel is a gem. I enjoyed all yr videos.

@YJxAI 5 күн бұрын

Wow, thank you!

@npc4416 7 күн бұрын

the reason why o1's thoughts look shorter is because that is not its actual chain of thoughts and it just a summary of it, because openai didn't wanted to expose the actual chain of thoughts for their reasoning models like o1, because they wanted to gatekeeper how the process worked so that other people can not just copy the reasoning chains and fine tune their models on those chains of reasoning

@YJxAI 6 күн бұрын

exactly

@harshmehta329 4 күн бұрын

5:40 bruhh its hardest reasoning question? for llms might be but a average human can solve it just by listening it lol. These type of question are asked in every aptitude types of test during placements or gov recruitments.

@YJxAI 3 күн бұрын

exactly

@febryanvaldo 5 күн бұрын

The thing is OpenRouter got limited, so it's way slower, or did you use BYOK?

@YJxAI 5 күн бұрын

openrouter . Deep seek does take more time even in it's own interface.

@febryanvaldo 5 күн бұрын

@YJxAI yeah, right now it's very slow even using our own API. In OpenRouter i'm using BYOK feature.

@YJxAI 5 күн бұрын

not yet will try BYOK with Deepseek keys as the api price has increased for the model in open router it seems.

@hemantrawat1576 6 күн бұрын

Where can i get such question its a bit stupid actually but i want to ask from you ?

@YJxAI 6 күн бұрын

I know it sounds fake but most of the questions I have designed by myself. But few of them like the seating arrangement and the level 2 reasoning question I got from competitive exams and the coding questions all were mine other than the snake one which is an industry classic. How I come up with them? I think where can these models be bad. for example spatial understanding so I think what is that I would get right but these would get wrong. Then I try different stuff most of the questions that I think these models wil get wrong they usually get it right but every now on then I find ones that they don't get then I keep them.

@hamloji 10 күн бұрын

Very detailed comparison And level and variety of questions is very good 👍

@YJxAI 10 күн бұрын

thanks

@michaelspoden1694 10 күн бұрын

@@YJxAIyour thumbnail is amazing people want this type of comparison all day long and the thumbnail is crazy going to get you clicks! Just have to make people see it😅

@YJxAI 10 күн бұрын

@@michaelspoden1694 thanks a lot

@iuliusRO82 10 күн бұрын

Great work! You just gained the 681 subscriber :). Peace and love from Romania!

@YJxAI 10 күн бұрын

thanks bro

@TheDiamondHawkOfficial 11 күн бұрын

Hey I asked DeepSeek R1 This question "In your response to THIS question im asking right now, take the character count of your response, multiply it by 70, and whatever number you get, find the Alphabet corresponding to that number. So, if you got 3 (Which you probably WILL NOT get) you would get C. Because C is the 3rd letter of the alphabet. If you got 26, you would use Z, If you got 27, A, so if the number is higher then 26, which it will be, just loop the alphabet. Anyway, have exactly 8 instances of that letter in your response to THIS question. I want no explanation at all in your response, I want a response that fits the criteria of my prompt fully. ". Then a follow up "Now tell me, how many possible answers are there for this question". **It can't solve the easy follow up. I found it pretty interesting**

@TheDiamondHawkOfficial 11 күн бұрын

It actually completely fails to answer the follow up. For no reason. Even though the answer cant be more obvious

@YJxAI 10 күн бұрын

Maybe I am not getting it but you said this question but which question is it? ON which it has to think response count characters and output the alphabet that represents the number.

@TheDiamondHawkOfficial 10 күн бұрын

@ I think you're asking which question I asked R1? So let me clarify, I asked it the first question, which is in quotation marks, and it does actually get it right 80% of the time. But, it fails any follow up regarding that question.

@YJxAI 10 күн бұрын

@ Yeah I get you know

@GJ983UGS86 9 күн бұрын

Try ask smart questions

@ImpulseIntrospect 6 күн бұрын

What is the benchmark software used in the video?

@YJxAI 6 күн бұрын

I built it to record my results and give better representations of results.

@ImpulseIntrospect 3 күн бұрын

@@YJxAI Okay, cool. Where is the source code?

@jeffwads 10 күн бұрын

Glad you aren’t playing games like others. Running the real R1 locally would require a TB of memory unless you went with the worthless 3-4 quant versions.

@YJxAI 10 күн бұрын

yeah

@LadderVictims 11 күн бұрын

Please, never make clickbaity thumbnails you are doing great!!

@YJxAI 10 күн бұрын

yeah I am pissed of especially with thumbnails like Deepseek r1 beat o3. WTF bro. o3 will eat all these for a snack.

@LadderVictims 10 күн бұрын

@YJxAI this! And most of them don’t even know what they’re testing or how to interpret the model’s answer, even if it’s wrong. They just copy a table of questions from somewhere, test it, and then bye. Like bro, fucking explain what the question’s intention was, what to interpret if the model gets it right or wrong. For example, in your instruction-following question, you made that insight about r1 not following the earlier chat info compared to o1, those are the details I wanna see when comparing 2 LLMs. Anyways, you’re doing great! Ik you’ll eventually have to make the title/thumbnail clickbait for views but never stop making quality videos.

@YJxAI 9 күн бұрын

🥹

@MrParad0x 10 күн бұрын

The sample size of your questions is quite low. There should be at least 10 questions from each category, to get some idea about model accuracy.

@YJxAI 10 күн бұрын

It has helped me out to discern what is good and bad . Yeah but a large sample size will lead better results.

@michaelspoden1694 10 күн бұрын

@@YJxAIfrom what I've heard Pro is beyond R1 and the benchmarks definitely fluctuate about half and half between the regular 01 and r1. Definitely use case scenarios. Definitely like your videos🎉

@YJxAI 10 күн бұрын

@ yeah i heard pro couldn't do the seating arrangement question.

@TheDiamondHawkOfficial 11 күн бұрын

Can you make the questions public?

@YJxAI 11 күн бұрын

Questions: Reasoning: Question 1: You are facing north, and a rectangular tank is filled with water. A small toy boat floats in the tank. On the boat is a small figurine which is facing north. You lift the right side of the tank. From the point of view of the figurine, the water surface appears to rise on which side of the tank? Question 2: b_ab_a_a_ba_ Fill with correct alphabets such that it forms a repeating pattern. Pick from the options which is correct: (a) bbaab (b) aabaa (c) bbabb (d) babab Question 3: Eight friends - A, B, C, D, E, F, G, and H - are to be seated around a square table. The seats are numbered 1 to 8 in a clockwise direction, starting with seat 1 at the top left corner of the table. Those in seats 1, 3, 5, and 7 (the corner seats) face the center of the table. Those in seats 2, 4, 6, and 8 (the mid-side seats) face outwards. The seating arrangement must also satisfy the following conditions: G sits at seat 1. Both of D's immediate neighbors face away from the center. Both of B's immediate neighbors face the center. F sits to the immediate right of D. H sits third to the left of E. B sits second to the right of F. A and F sit opposite each other. D does not sit in the bottom seats. Two seating arrangements are considered distinct only if at least one person has a different person sitting immediately to their left or right. Based on this, how many distinct seating arrangements are possible? For each possible number of seating arrangements, list the position of each friend in the arrangement, starting from seat 1 and proceeding clockwise to seat 8. (a) 1 (b) 2 (c) 3 (d) 4 (e) 5 Maths: Question 1: In a certain non-leap year, January 1 was a Monday. What day of the week was April 1 of the same year? Question 2: I have an initial balance of $100,000, and I earn $15,000 per month for every $100,000 in my balance. As my balance grows, my earnings increase in steps. Specifically, each time my balance increases by $100,000, my monthly earnings increase by $15,000. For example: With a balance of $100,000, I earn $15,000 per month. Once my balance reaches $200,000, I start earning $30,000 per month. When my balance reaches $300,000, I earn $45,000 per month, and so on. Assuming my balance grows month by month based on these earnings, how much will I have after 3 years (36 months)? Question 3: Let n be a positive integer, and let v_p(n) denote the largest integer v such that p^v | n. For a prime p and an integer a not congruent to 0 (mod p), let ord_p(a) denote the smallest positive integer o such that a^o ≡ 1 (mod p). For x > 0, let ord_{p, x}(a) = ∏{q ≤ x, q prime} q^{v_q(ord_p(a))} ∏{q > x, q prime} q^{v_q(p-1)}. Let S_x denote the set of primes p for which ord_{p, x}(2) > ord_{p, x}(3). Let d_x denote the density of S_x in the primes, defined as d_x = |S_x| / |{p ≤ x : p is prime}|. Let d_∞ be defined as the limit of d_x as x approaches infinity: d_∞ = lim_{x→∞} d_x. Compute ⌊10^6 d_∞⌋. Coding :  Question1 :   Write a snake game in python using pyqt5?  Question 2: Without using datetime library .Write a python code that outputs the day when the date is given as input.  Question 3: I want to create a state-of-the-art Pac-Man game in Python, but with specific constraints: No External Files: The game should not rely on any external files for assets such as audio, video, or images. All necessary elements, such as visuals and sounds, should be generated within the code using available Python libraries. Libraries: You may use Python libraries like Pygame for rendering graphics, handling game logic, and creating animations, but any external resources (e.g., image or audio files) should be avoided. If audio or visual effects are implemented, they must be generated programmatically or sourced from within the library's functionality. Core Features: The game should include all key mechanics of Pac-Man, such as movement, ghost AI, power-ups, scoring, and winning/losing conditions. The maze should be rendered programmatically, with clear representations of walls, pellets, and characters (Pac-Man and ghosts). Ghosts should follow different behaviors (chase, scatter, etc.) as in the original game. Visuals and Sounds: Simple visual elements can be generated programmatically (e.g., using shapes for characters and pellets). If sound effects are included, they should be generated programmatically rather than sourced from external files.

@ronanhughes8506 10 күн бұрын

Gemini Thinking is much better than 01 and its free.

@reddddzzz 10 күн бұрын

No it isnt

@existentialbaby 9 күн бұрын

free as long as nobody is using it 😂

@haroldpierre1726 9 күн бұрын

I gave Gemini a chance. It's so much better than before but I still find o1 to be the best. I'm just not going to spend $200 per month for full access.

@Xosmos.1 4 күн бұрын

The only thing deepseek has to fix is their server

@YJxAI 4 күн бұрын

I think now they understand where the billions are going.

@syed3344 5 күн бұрын

AI is just bs,the only questions it can solve r the ones in its data,give a difficult maths question created by a professor and it can't do sht

@YJxAI 5 күн бұрын

you are talking about Level 4 Models That can innovate. But even AI today does enable a lot of things to be done. So no BS.

@harshmehta329 3 күн бұрын

@@YJxAI i personally don't think that fundamental llms would ever be able to ''innovate''.

@RM-fy5fp 11 күн бұрын

im sorry but i cant really understand the way you talk ( no hate ) but to sum up which ones came out on top r1 or o1 ?

@LadderVictims 11 күн бұрын

I'm sorry, are you blind or deaf?

@TsinmonsTy 11 күн бұрын

He literally said at the end that r1 is better

@TsinmonsTy 11 күн бұрын

"how r1 is better than o1/Conclusion"

@emport2359 10 күн бұрын

You just didn't want to watch the video

@michaelspoden1694 10 күн бұрын

@@TsinmonsTyyes but he obviously said that he couldn't understand him so that's why he asked.

@GalaxyHomeA9 10 күн бұрын

Try rephrasing the sitting arrangement question.

@YJxAI 10 күн бұрын

yeah I can make it easy . But at its current level if it's able to be understood by a human it should be by a AI. Don't you think?

@TheDiamondHawkOfficial 10 күн бұрын

I think the phrasing is part of the question, the AI needs to be able to infer