I think given no point when both are correct may bias the final result. Let's say, you've done 20 tests, 15 are the same results, 1 gpt4o is better, 4 Claude Sonnet is better. The score is then 4-1 for Clause Sonnet but actually it's more 19-16.
@luigideff5 ай бұрын
Yea exactly. It totally makes a perception difference.
@beautifulandtoolate4 ай бұрын
A draw is typically 0.5 points each
@FloatingWeeds24 ай бұрын
Do 0.01 points each for a draw so you can have two different categories of data in one number 😘
@PatrickStorm_4 ай бұрын
Good call, it looks a lot different when you include ties. I recalculated the final scores adding in a point for ties, and the final tally is GPT-4o: 17 and Claude 3.5 Sonnet: 19. That shows a clearer picture of how close these models are 👍
@PatrickStorm_4 ай бұрын
Nice. That is my kind of useless optimization 😎 This would end up with GPT-4o at 6.11 and Claude 3.5 Sonnet at 8.11
@MartinJefferies-j1d5 ай бұрын
Summary: 1. both are great. 2. don't use either for fact finding. 3. Since they are both free, use both simultaneously.
@PatrickStorm_4 ай бұрын
A bit reductive :) But yeah, that's the gist! Both are really good and have different strengths.
@theplaylistpsycho4 ай бұрын
Calling both free isn't entirely accurate, both are free but limited access. When you hit the free user limit, Chatgpt forces you from 4o to the 3.5 model, and Claude just bars users from using since they currently don't have an unlimited use model for free users.
@MeQt4 ай бұрын
They arent free though?
@oberpenneraffe4 ай бұрын
@@MeQt Some requests are free, buts there is a strict limit after a few requests. Then you have to wait a few hours before you can use it again for free.
@JamesR6244 ай бұрын
"don't use either for fact finding" If neither of these can even do basic fact-checking, then what's the point? So basically they're nifty gimmicky chatbots but with a practical usability that's outclassed by Assistant and Siri from a decade ago since those actually search the web and give you sources.
@drlordbasil5 ай бұрын
Claude Sonnet is wayyy better for complex tasks and assistance in debugging.
@KroeSufos1024 ай бұрын
Perfect for my work!
@GeeGnebAb4 ай бұрын
yeaaa crazy how good sonet is, it's like talking to a professional who can really solve and explain the problem
@ktms11884 ай бұрын
Claude 3.5 and GPT-4o both have their strengths, and it’s fascinating to see how they differ. Claude feels more human, like it’s really trying to understand what I’m asking, but then I’ve noticed with the memories function in GPT the model I think knows a lot more when I’m trying to ask now so now has much better answers like Claude 3.5. My issue is sometimes it hits those frustrating blocks and says it’s unable to answer my question, which drives me nuts even when it’s nothing controversial and it clearly would know the answer. I noticed in one of their talking points that is one of their big things. They are working on as it is overly restrictive and they know it so improve that. GPT-4o, on the other hand, is super analytical but occasionally needs me to rephrase my questions to get the best answers. I’ve been using both for a while now, and here’s what I’ve found: Claude’s artifact mode is mind-blowing, it’s nice if you’re on an iPhone or iPad since no android app. GPT’s memory function is a game-changer, making it more accurate over time as it learns from our interactions. Wouldn’t it be amazing if they combined the best of both worlds? I’d love to see a deep dive comparison between custom GPTs like “Scholar” and the standard GPT-4o, especially for fact-based questions. Does the customization really boost accuracy?
@briankgarland4 ай бұрын
I pay for both, primarily for coding, and haven't used 4o since Sonnet came out.
@baldeeptiwana4 ай бұрын
I also to want to buy the paid version for one of them for my python-gis project. Do you think Sonnet is better for coding?
@mikemin54 ай бұрын
@@baldeeptiwanaNow I’m not a coder or even know what python-gis means, but I think that if you’re trying to make a rendered program, Claude’s split screen is nice, but it shouldn’t be a dealbreaker. I use ChatGPT all the time for coding and with few prompts, can get a perfect piece of code that works exactly how I want it no matter how complex my needs are. I’ve made tons of sites and little programs in html, and both AI’s are definitely gonna be good. A 3% difference in some random test shouldn’t show you where to put all your money.
@zHqqrdz4 ай бұрын
@@baldeeptiwana It objectively is
@barafwal2533 ай бұрын
Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files. Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too? In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?
@Ivan7Kovnovic4 ай бұрын
The GDP 2018 question was actually answered correctly. According to every source I found on the internet, Germany was 4th and the UK was 5th.
@manuelcardoso58304 ай бұрын
Agreed, I saw the same thing. Both AIs were rigth.
@gege1515004 ай бұрын
I actually found India on the IMF website?
@PatrickStorm_4 ай бұрын
Yup, I messed up here 😬 I saw the GDP based on PPP rankings, and mistook it for nominal GDP. The AIs got it right! Source: en.wikipedia.org/wiki/List_of_countries_by_GDP_(PPP)
@DGaryGrady4 ай бұрын
@@PatrickStorm_ There are different ways of computing GDP depending on the purpose of the comparison, so there is no single right answer. The most usual comparison is based on currency conversion, but PPP (purchasing power parity) is more appropriate in some contexts, especially if you're ultimately looking at GDP per capita and standard of living. (Incidentally, depending on when you ask the question, the comparison changes if you pretend California is an independent country. OpenAI and Anthropic are both based in San Francisco, but that's just a coincidence. For now.)
@Abhiishek-G4 ай бұрын
00:03 CLA 3.5 Sonet outperforms GPT 40 in benchmarks 02:27 Claude 3.5 Sonet outperforms GPT-40 in speed and live code demonstrations. 04:46 CLA 3.5 Sonnet outperforms GPT-40 in creative writing tests 07:13 Comparison of performance between CLA 3.5 and GPT-40 models 09:52 Comparison of Claude 3.5 Sonet and GPT-40 12:27 Difference in code review of Claude 3.5 Sonnet and GPT-4o 14:57 Comparing GPT-3.5 Sonnet and GPT-4o 17:28 Comparison of GPT-40 and Claude 3.5 Sonnet performance on trivia questions 19:49 GPT-40 performed better in factual accuracy 22:04 Claude 3.5 Sonnet outperformed GPT-40 in understanding and summarizing human emotions. 24:18 CLA 3.5 Sonet offers better performance and cost saving
@blueicicle19734 ай бұрын
it sounded a little biased towards Claude
@PatrickStorm_4 ай бұрын
Yeah, I have to figure out how to do this as a blind test next time. Thanks for the feedback
@adammccoy14 ай бұрын
@@PatrickStorm_veryyyy biased exp the marking system , disliked
@shreyashsingh86824 ай бұрын
@@adammccoy1 Have you ever tried Claude? It's better than Chat gpt 4o for complex tasks
@paulustangkeallo78404 ай бұрын
@@PatrickStorm_ One alternative is to review the output without knowing which LLM produces that output.
@_fuji_studio_3 ай бұрын
bruh claude sonet 3.5 is way better, i use it for coding at its amazing
@JosefTorkelsen4 ай бұрын
Great job on the video dude! I also agree with your results for yourself at the end that discusses how you plan to use them. I like Claude but without those extra things, Chat GPT is my daily driver.
@RoseAlternative4 ай бұрын
Thanks for putting in the time and effort to make this video! I was wondering if I should renew my GPT-4o, or try Claude for the first time. Now I'm set on trying Claude. The video quality is amazing, keep up the good work! :)
@litpapi18494 ай бұрын
same here! been a long time chatgpt user we'll see how this goes
@PatrickStorm_3 ай бұрын
Thanks for the kind words! I currently have both ChatGPT and Claude subscriptions, but if I had to choose just one, I would probably go with Claude. It's sort of a toss up at the moment.
@RoseAlternative3 ай бұрын
@@PatrickStorm_ I’m absolutely loving Claude right now. The ONLY downside I’m experiencing compared to GPT4o is that it feels like I’m being limited too much with the amount of messages I can send, and also the maximum of 5 images per conversation can be a pain. Overall though, I’m using it for code assistance and it has phenomenal coding uses.
@PatrickStorm_3 ай бұрын
@@RoseAlternative Yeah, the message limits really have been a pain. That's usually when I switch to ChatGPT :)
@joserubio30362 ай бұрын
I have used gemini 1.5 pro, chatgpt 4o and Claude sonnet 3.5 base model and OMG Antropic made a great work. The results where MUCH MUCH better in every single aspect I used he models for: - Coding - Data analysis - Huge data information sum ups - Getting insights and core info from research papers - Law content - Studying purposes: making Anki type questions, creating Feynman technique summaries, etc By far Claude offered me the best outputs out of all and it was just the free version... definitely gonna give a try this one for a while and analyse even deeper. If you ask me definitely pick Claude pro version which is even cheaper than rest of LLMs
@suleymanbolek72964 ай бұрын
This was the best comparison video on KZbin. Great job man, subscribed.
@u4icdissonance1804 ай бұрын
Excellent video, I think you did a good job of being objective. Just a note, if you're looking for conversation and support, tell GPT you're looking more for an emotionally supportive answer than a solution based answer. Then what you get out of it is very similar to Claude. Claude still deserves the point because your average user likely won't think to try that, but the option is there for people that want more conversational GPT.
@prithviraj10804 ай бұрын
Well researched. You document many of the use cases I need and use. An excellent video.
@Repz985 ай бұрын
This video was really well made, and I enjoyed it through the entire video! I thought I was watching someone with 200k plus subs, based on the quality of this content. Keep it up, I’m subscribing now!
@PatrickStorm_4 ай бұрын
I'm really happy you liked it :) and you made my day saying it was comparable to a channel with 200k subs!
@_HMCB_4 ай бұрын
First time visitor. Awesome stuff. Clearly presented. And your speaking is a good pace. So many KZbinrs need to learn how to slow down and enunciate. Everything feels so hurried. You’ve earned a new sub. Than you.
@andrewslabbert43164 ай бұрын
I've watched a lot of AI videos out there, this one was truly helpful. You've gained my subscribe & my full attention Patrick! Thank you!
@PatrickStorm_3 ай бұрын
I really appreciate it. Glad you're getting something from my videos!
@nahadmaniyot67904 ай бұрын
Sure, here is a summary of the video in points: * This video compares two large language models, Claude 3.5 Sonnet and GPT-40, by giving them a series of head-to-head tests. * The winner is determined by the video creator, Patrick Storm, based on his subjective criteria. * Claude 3.5 Sonnet wins in creative writing, dialogue generation, sentiment analysis (partially), conversational skills, and summarization (when length is considered). * GPT-40 wins in factual question answering and image generation (because Claude 3.5 Sonnet doesn't have an image generation model). * They tie in summarizing a research paper and coding (when the prompt is simple). * Overall, Claude 3.5 Sonnet performs better based on the video creator's evaluation.
@fernandoz63295 ай бұрын
In this type of showdown finding which is the best, I think that would be useful to the evaluator not know who is creating the answer to avoing being biased for personal preferences. There were few answers where I disagree, so maybe I'm biased too. For coding, DeepSeek 2 is outstanding.
@PatrickStorm_4 ай бұрын
That is a really good point. I am very sure I would be able to distinguish between them even if it was a blind test, so I wonder how I could do that 🤔 I’ll keep this in mind for future comparison videos though. Thanks for the feedback.
@PatrickStorm_4 ай бұрын
And yes, DeepSeek 2 looks really good, I haven’t tried it out yet, but it’s on my list!
@sergeyromanov27515 ай бұрын
Your list of questions was not balanced. You focused too much on the language problems (where Claude 3.5 Sonnet is clearly ahead) and completely ignored the logic, reasoning and math problems (where GPT-4o would crush its opponent).
@PatrickStorm_4 ай бұрын
Totally valid critique. I'll add those sections in to future model comparison videos.
@-Meric-4 ай бұрын
GPT is pretty bad at logic and math based on benchmarks. Claude would probably win there as well
@drlordbasil5 ай бұрын
Beautifully done on the video bro.
@FrancescoDellaValle4 ай бұрын
I appreciate your work, but I found this video biased and inconsistent in judging the two models' responses. In two or three tests, you verbally preferred ChatGPT, yet you didn't award any points and declared a draw. This doesn't seem unbiased to me.
@peanutbutterjellybeans13364 ай бұрын
ChatGPT-4o is my quick search engine. Claude 3.5 Sonnet is my main workhorse. The artifact feature in Claude 3.5 Sonnet is so good.
@chrishackney95124 ай бұрын
Great video, you’ve just got yourself a new sub. 😊 I would suggest that Claude expressing it’s uncertainty with some answers and recommending users follow up themselves is a much more responsible solution than ChatGPT’s matter-of-factly stated incorrect answer. Looking forward to seeing more!
@PatrickStorm_3 ай бұрын
I agree with you, and I think that's why Claude might be slightly behind on the Chatbot Arena leaderboard, because people might prefer a wrong answer (if they didn't know it was wrong) vs an answer that says "I don't know".
@vm_jayfus93324 ай бұрын
Your channel deserves sooooo Much more attention😮
@AndresIbanezVasquez3 ай бұрын
Thanks for the comparisson! I do agree in the end it comes down to personal preference, and people should really try both (and perhaps also Gemini) to see which one suits them better. I used to write poems in my youth, I was quite fond of them, and I actually prefered the poem GPT gave you, it felt more elegant with more "fancy" but also soothing words, while Claude in my opinion gave a not very memorable rendition with somewhat generic and common words. But again, its personal preference.
@AndresIbanezVasquez3 ай бұрын
After trying this prompt myself, I also added "pretend you are a world-class poet" and Claude's version was almost on par with GPT in my opinion, so I guess providing detailed prompts is also very useful!
@TheRealExecuter224 ай бұрын
This was a great video, good pacing and structure without any empty ai hype bullshit. Very pleasing production and you got a beautiful room for filming!
@Wonders_of_Reality5 ай бұрын
Oh! Big thank you for the light theme! You’ve just won a subscriber.
@MusicStudioNYC2 ай бұрын
Well done!! Can you do it again with the new GPT-o1 model?
@MassiveDerek4 ай бұрын
3:23 i thought someone was inside my house whistling
@PatrickStorm_3 ай бұрын
Lol sorry about that. I was going for a western theme for that part.
@SSS-100M4 ай бұрын
Your video is great! I can understand the difference between Claude 3.5 Sonnet and GPT-4o. Also, I canceled GPT-4o because Claude 3.5 Sonnet is better than GPT4o. When I want to create an image, I use Gemini1.5 Pro.
@zejdzglebiej5 ай бұрын
The question is, what do you mean by writing a better text? I'm afraid that you evaluate texts too positively, where there is understatement, and a lack of logical structure with opening and closing. You perceive it as an aura of mystery. That's why Clodie cheated on you, because what he couldn't do, you interpreted as good writing.
@MrAmad3us4 ай бұрын
Claude premium plan gives less messages / dollar. It’s significantly more consistent in long and complex convos, but you reach the 5h message limit quickly
@gideons61264 ай бұрын
Cheaper per token if you use the API which is how I go for it but I agree with you about premium value
@fool-on-the-hill4 ай бұрын
Yeah, I ran into this quickly, not knowing there was a limit. The only limit I ran into with ChatGPT is the amount of images I could create, but it took me a while to get there even. That stinks. Oh, and Claude can’t do images… Currently, I’m subscribed to both to see if I can tell which ones better, but I’m not sure I can.
@barafwal2533 ай бұрын
@@fool-on-the-hill Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files. Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too? In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?
@barafwal2533 ай бұрын
Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files. Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too? In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?
@igor_in_theusa5 күн бұрын
Вам нужно будет сравнить полноценную o1 (не preview) и Claude 3.5 Opus. I am most looking forward to the release of these models.
@djayjp4 ай бұрын
I'm pretty sure limes float if they have been squeezed and sink if they haven't.
@jerkface384 ай бұрын
The esther question is also dumb. Let's be honest here, the info on the internet is ambiguous. Lots of info saying she was married in 1981 or 1982. Only the wiki says m. 1985 but with no real context. Had he asked both models why they said what they said, he would have gotten a refined response. Also, he only gave 1 point for image generation? I'm sorry but that's at least a 3 pointer even if some of the images suck.
@NithinJune4 ай бұрын
Claude genuinely seems so exciting
@РодионСадыков-е2г4 ай бұрын
GPT’s recognising Obama’s prank is astonishing
@costicanu74 ай бұрын
on writing code, gpt 4.o is way better than sonnet 3.5 I tried them both multiple times, sonnet 3.5 sometimes does not understand when is a harder task. sonnet 3.5 surprised me when I asked for a solution, his answer was suitable for my task. Very good to have them both!
@barafwal2533 ай бұрын
Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files. Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too? In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?
@_fuji_studio_3 ай бұрын
bruh its the opposite, claude sonet 3.5 is way better at coding, i switched to it for my coding assintance the moment i supprised it give cery good answer while gpt can't and gpt repeat the same wrong answer
@coursehub24072 ай бұрын
@@_fuji_studio_ same i also swtich to sonnet 3.5 for coding
@Ashystar0674 ай бұрын
awesome video, thanks!
@NithinJune4 ай бұрын
for the coding tests you should do a plagiarism check to check if it is straight ripping someone’s code
@incription4 ай бұрын
sometimes it does, but programmers also are guilty to that a lot... if it works, no need to rewrite it
@maximiliandegarnerinvonmon64574 ай бұрын
They both actually got it correct about the 5th highest GDP in 2018.
@maximiliandegarnerinvonmon64574 ай бұрын
Gives a scary example of how we presume that we are superior 😂😂
@maximiliandegarnerinvonmon64574 ай бұрын
We are usually 4th but 2018 was the time we dropped fir that year and again just recently last year. It's a sticking point here due to elections and that's how we know 😂😂😂
@djayjp4 ай бұрын
19:32 You forgot to give the point to GPT-4o in the tally.
@AJBarea4 ай бұрын
he gives gpt-4o two points and claude 0 points for that whole category less than a minute later around 20:13
@RanLM14 ай бұрын
Great video. Thank you. Subscribed
@mrchrizztech10504 ай бұрын
These coding question are really useless imo.... I would ask these questions: 1. Given our custom Database (.db) file we want to find out what the average income is for each item. We want a webpage that shows us this total and where we can search for items to find the totals. (Testing coding, HTML, Css, js and also SQL and DB knowledge) 2. Create a very small application that runs as a windows service that creates a window we need to close to start a counter. After x seconds we want that window to reappear and to close (Tests out knowledge of OS and code) 3. Given an foreign API we have the endpoint x. (Insert here custom endpoint). We want to transform the response in a way where we can use it to show it innfor example our windows service each time it pop ups
@promansarwar43745 ай бұрын
Great content man. This channel is going to blow up soon
@dominik44963 ай бұрын
How do you go about creating such comparison videos? How do you film and edit the videos? :) Really nicely done!
@PatrickStorm_3 ай бұрын
I'm glad you like the video! I film with a Sony camera, and edit it all in Capcut which is free. Nothing too fancy going on, just spending a bunch of time editing, and watching youtube videos to figure out how to do certain things!
@onesimplecuban4 ай бұрын
I pay for my Claude. I’ve made so far amazing web applications as well as programming. I love it
@Yolotofigureoutthehardproblem3 ай бұрын
This is a really good video
@artificiyal4 ай бұрын
the only thing seperating them is now the training data
@PatrickStorm_3 ай бұрын
Yeah, you are right. And I think that's clear by Llama 3.1 being about as good as these models at a smaller size. It's all about the data.
@timothyhernandez51414 ай бұрын
How about Claud 3.5 to gpt 4? , which is better 😅Thanks
@amirhossein11084 ай бұрын
What is the difference between gpt 4 and gpt4-0??
@SomeOne-p6f4 ай бұрын
What should they have done with the 1 second interval question to keep to 1 second exactly?
@PatrickStorm_4 ай бұрын
It can go pretty deep, but they should create a variable with the start time, then use that to calculate time until the next tick. This stackoverflow I just found has the problem and solution well laid out: stackoverflow.com/questions/29971898/how-to-create-an-accurate-timer-in-javascript
@SomeOne-p6f4 ай бұрын
@@PatrickStorm_ That's a great link, thanks.
@rupiter20084 ай бұрын
For today Claude 3.5 and GPT 4o are my favorites among all of them. Most of the time I prefer Claude because it is clearly more "intelligent". But recently I stumbled upon a problem. I had a handful of receipts I needed to process. I made a pdf file with all of receipts and asked Claude to make a spreadsheet with all the information from them. But Claude missed one receipt for some reason. When I mentioned it, Claude tried to solve the problem by adding another receipt with arbitrary information in it! On the other hand Chat GPT did the job correctly right away.
@PatrickStorm_3 ай бұрын
Yeah, they seem to have different strengths. Neither are particularly good at taking a large number of files/images and extracting complete, accurate data. There are other, specialized AIs that are very good at that exact task though, especially for receipts.
@MudroZvon5 ай бұрын
Instant subscribe!
@Sindigo-ic6xq5 ай бұрын
really nice! better than most other reviews
@SurfCatten5 ай бұрын
Pretty impressive for a fairly small KZbin channel! If you can get the visibility I expect your channel will do very well. Unfortunately for that to happen you need to go pretty far down the clickbait KZbin algorithm there's just no other way to build views and subscriptions
@AbolfazlArghandehpoor4 ай бұрын
Thanks you 🎉
@trainspotting024 ай бұрын
Sorry these are such basic tasks. A proper test is a custom environment, action and reward used in reinforcement learning.
@knockout_highlights.4 ай бұрын
Honestly, I think it depends on what type of task you do on a daily basis. If you're into coding, you might gear towards Claude. However, it terms of problem-solving skills and solving complex math problems (in my study in mathematics), GPT 4o is clearly better by a large margin.
@PatrickStorm_3 ай бұрын
Agreed. They clearly have different strengths. I mostly use these LLMs to brainstorm and code, and I think that's why I prefer Claude.
@djezio2584 ай бұрын
Conclusion: GPT-4o is for research and Summarization, while Claude 3.5 Sonnet is great for poetry, conversation, storywriting and dialoguing. I prefer to spend $20/month for GPT-4o
@johnreimer43614 ай бұрын
Ask Claude 3.5 Sonnet: "List the first 12 people in order that landed on the moon." The result is correct, and it shows that the 11th person was, in fact, Eugene Cernan. It pays to ask questions that force a step-by-step process of answering.
@HERKELMERKEL4 ай бұрын
I was using ChatGPT4o for creating artistic literature; poems and song lyrics etc.. then Claude 3.5 came out.. than i was blown away. It is litterally a LLM (Large Language Model) because it understand a lot of world languages than GPT, It creates beautiful lyrics than GPT.. so in that field the winner is obvious.. but there are some fields that it was not better.
@HERKELMERKEL4 ай бұрын
I am using free version, it was limited to a couple of prompts.. so i write very long paragraphy to save some steps
@Le_Lys_Eclectique3 ай бұрын
Great video thanks! It would be really awesome if you get them both speaking with each other to, i font know, maybe how to solve some of the greatest problems of humanity. And maybe have them choose, together, what the 1st most important would be! Do you think if what I’ve suggest is even possible???
@FloydCotton-hx4jh4 ай бұрын
It is funny to me how people do comparisons like this. Yours was very well done and quite entertaining. However, the speed at which these models leap frog one another makes this kind of a moot point.
@PatrickStorm_4 ай бұрын
You’re right man. I was afraid a new, better model would come out before I even released the video. But I do think it’s worth evaluating new models to know what it is good at, and determine which model to use for different scenarios. Either way, I’m glad you got something out of the vid!
@xwuzantilelij52154 ай бұрын
Forget all of those. I like ChatGPT, but Claude is my friend. If you ask why, for example, ChatGPT does the best job of interpreting an MRI result or a laboratory result - I've tried this. ChatGPT is a Prof. Dr., an expert in every disease. However, when I told both ChatGPT and Claude about the problems I had with my ex-girlfriend, while ChatGPT gave quite superficial, mechanical answers, Claude responded and gave me suggestions in a language like a psychologist, like a close friend.
@PatrickStorm_3 ай бұрын
You can coax ChatGPT into being more human-like, but it's often times a struggle. I much prefer Claudes baseline tone too
@truecuckoo4 ай бұрын
Isn’t the biggest improvement with GPT-4O the audio conversational voice chat skills? Not converting audio to text prompts, but actually understanding the tone of the voice itself etc. It comes across as pretty nuanced in the demos I’ve seen.
@PatrickStorm_4 ай бұрын
Absolutely! But that isn't available to anyone but insiders yet. But even without the audio, when GPT-4o came out, it was top of the leaderboards for pretty much everything.
@journees43004 ай бұрын
This is not a well thought out test. There are actually standardize methods to compare AI models. Look for GLUE, COCO, MS MaRCO, SQuAD, etc. It depends on what aspects do want to compare.
@Imperial_Dynamics4 ай бұрын
You did not increase the width of the menu bar in Claude's case to see if the buttons re-appear
@photelegy4 ай бұрын
7:15 In the image-recognition-tests I'm not sure if they really thought by themselves about this or if they found the image on the internet and found e.g. a forum or so where it's already explained. So the "just" had to reframe it.
@byrnemeister20084 ай бұрын
Well Claude doesn’t have internet access so that’s not really and option.
@photelegy4 ай бұрын
@@byrnemeister2008 Ok. Thanks for the a notice. So what's the actuality of the model (after what date doesn't claude doesn't know anything?)
@hypheng4 ай бұрын
I tried the question about the 11th person on moon, GPT4o is correct now, looks like GPT keeps learning quickly
@PatrickStorm_3 ай бұрын
That's interesting. I wonder if I just asked it a few times in a row if it would have gotten it right.
@robwin00723 ай бұрын
Hello, Good video. I liked and subscribed. First, I think you stiffed GPT4o on the R:8 summary question. Yes, it was more than 300 words-however, since it hit all the aspects of the dense article, GPT4o should have received a point. Also, the prompt scrolled fast-I was unable to read it-so I don’t know if you asked to limit it to 300 words. Second, I have to write a production program for a small operation insurance company. I will use GnuCOBOL; which of the two would you use to assist in that project?
@PatrickStorm_3 ай бұрын
I would use Claude. In every benchmark I’ve seen, Claude is the leader for coding.
@sckarzzz99734 ай бұрын
You should do a Blind Review of the answers so you remove any bias from the reviews
@LikeAPro.19952 ай бұрын
Also, you 19:29, you gave point to GPT but you didn't increase its point on the screen. Thank you though for your contribution. It was interesting
@rtos4 ай бұрын
GPT4o has done much better in image interpretation as I've seen.
@hidd3n_4 ай бұрын
imagine if we had their child, Claude Pete 4.5
@codingzen8694 ай бұрын
I just like the fact that Gemini doesn't even count anymore. They are way off the race. Google deserves every bit of it.
@PatrickStorm_3 ай бұрын
I wouldn't count Google out just yet, Gemini has been slowly getting better and better. Who knows which model will be leading in 6 months or a year.
@cesarsfalcao4 ай бұрын
I'm using GPT 4o for free, it's a win.
@arthurfernandes69344 ай бұрын
GPT 4 is massively better in context, writing and everything that needs interpretation: GPT 4 bunny history: Under the pulsing neon of 2147, Detective Eli Ramos adjusted his Temporal Badge. The rain-flickered holograms cast shadows as he stepped into the alley where time had fractured. The chrono-meter beeped-1932, Chicago. Eli's target was Alvin Holloway, a time fugitive masquerading as a bootlegger. Using a purloined time device, Holloway had been jumping decades, altering events for personal gain. Eli slipped through the temporal gate, his trench coat merging seamlessly into the milieu of mobsters and flappers. The smoky speakeasy was loud with jazz, the air thick with conspiracy. He spotted Holloway immediately, charm and danger rolled into one. "You can't keep running, Alvin," Eli said, cornering him by the bar. "And you can't stop me," Holloway smirked, reaching for his concealed device. Quick as a blink, Eli drew his Temporal Disruptor. A beam of blue light, and Holloway froze-the device dropping, useless. "You're out of time," Eli quipped, cuffing Holloway. Back in 2147, the chief commended him. But Eli knew better. Time was a loop; his next case was just a ripple away. Somewhere, sometime, another fracture awaited. But for now, under the eternal city lights, Eli Ramos took a well-deserved breath. GPT 4o: In the bustling heart of Neo-Victorian London, Detective Arlo Sinclair adjusted his bowler hat and checked his pocket watch. The gears inside whirred, signaling it was time. With a flick of his wrist, the watch’s face shifted, revealing a hidden dial. He twisted it, and the world around him shimmered. Moments later, Arlo stood in the shadowy alley of a different era-New York City, 1942. The scent of fresh rain mingled with smoke from the distant war effort. He scanned the street and spotted his target: a man in a pinstripe suit carrying a briefcase too important to be left in this time. Arlo followed, his every step silent as he closed in on the man. When the moment was right, he sprang, snatching the briefcase. The man barely had time to react before Arlo twisted the dial again. The city melted away, replaced by the futuristic skyline of 2147. In his hidden office, Arlo placed the briefcase on a sleek table. Opening it, he smiled at the sight of the stolen schematics, safe from those who would misuse them. Time could be twisted, but justice always straightened its course. Arlo leaned back, ready for the next case. "Ai que tudooo," he murmured.
@tekratek40774 ай бұрын
I was always using gpt-4o for my tasks until i discovered Claude. I code with claude and let GPT- 4o debug it. Perfect combo for me. Claude understands python and web design way better than GPT-4o.
@PatrickStorm_4 ай бұрын
That is a really interesting approach. The two models obviously have different data sets, so they might have different things they notice in the code. Makes me really curious how the models combined like you use them would perform on benchmarks
@kaicex3 ай бұрын
I understand that in the free version of Claude, you get 5 free queries with Claude Sonnet. How many free queries will I get with Claude Sonnet if I buy the Pro plan?
@PatrickStorm_3 ай бұрын
I chat with it all day and run out most days by about 3pm. It's way more than 5, but it depends on how long the chat is. I would say that I easily get 50 message in before it tells me to wait a couple hours. But, then I just switch back to ChatGPT!
@mingistech4 ай бұрын
I agree, Claude really needs voice chat added.
@julianvillaquira41274 ай бұрын
Where are you taking your answers from? (for example, the GDP one I think Germany came fourth, not fifth)
@PatrickStorm_3 ай бұрын
I tried to get questions with clear answers that I found from multiple sources online. That specific question was actually wrong though, or at least not entirely correct. This got called out in another comment, but I was using a different calculation of GDP than the common one that both LLMs answered with.
@Kutsushita_yukino5 ай бұрын
claude 3.5 sonnet lost it’s EQ OPUS had though….so it’s not reliable as a conversational model. OPUS is still worth it if you get tired of sonnet 3.5’s robotic bland responses. try comparing them if you don’t believe me. this is not subjective, it’s fact that sonnet 3.5 sacrificed emotional intelligence for more specs.
@229Mike4 ай бұрын
I don’t know if I can agree with this fellow. I tried using Claude 3.5 sonnet and it got a picture breakdown, incorrect by the timeline that was present. No problems wit chat. And at that point, I just didn’t wanna trust Claude.
@Dannnneh4 ай бұрын
How do I access Claude 3.5 Sonnet as a european?
@cbnewham56335 ай бұрын
While you are correct about image generation being only available in gpt4o, it would have been good if you had a section on image generation using code. I quite often get GPT to create graphs or simple line graphics, which can be done easily in code. Eg: draw a graph of x to the power of 1.7, or draw an alien sprite from Space Invaders.
@cbnewham56335 ай бұрын
PS: for GPT you of course have to tell it to do so using its code capabilities, otherwise it tries to do it using DALL-E, with amusing results.
@PatrickStorm_5 ай бұрын
That's a good point. I could have done that in either the image generation or the code section. The new Artifacts feature is actually really cool when it comes to this sort of thing - I haven't tried using it for graphs yet though, I wonder how well that works. Now I'm curious!
@PatrickStorm_5 ай бұрын
Alright, I tested it. It does show the graphs, but it really wants to use React and include libraries, but this just shows a blank screen in Artifacts. I told it to just use plain html with a canvas element, and it did show a graph, but it was wonky. Like I said in the video, I think Artifacts just isn't there yet to be truly useful. However, I bet the original React code that it gave me would have looked great. I think I'll add this test into future comparison videos like this. Thanks for the feedback!
@cbnewham56335 ай бұрын
@@PatrickStorm_ GPT uses python to create its graphs. As I recall it uses some libraries to do the graphing - although as I'm not a python coder I have to admit I've never bothered to look at the code it produces. I've not tried Claude as yet for graphs, so I wonder what happens if you specify what language it should use for the graphing or icon generation.
@jasonk1254 ай бұрын
The question is, what are the gotchas in the JavaScript timer coding problem?
@hondacrxmk254 ай бұрын
You need deduct time between function calls from delay value :)
@rachelsnijders8174 ай бұрын
Claude was better at writing the short scene with the bunny, but still made a lot of mistakes. For example: the smell of revolution. What? (also, a smell does nothing to cover up theft) You can use AI when brainstorming ideas in creative writing, but do the actual writing yourself 😉
@OverLordGoldDragon5 ай бұрын
The link in "Link to text responses from the video" seems to point to something else.
@PatrickStorm_5 ай бұрын
Apologies! That was a copy/paste issue. It's been updated with the correct link. Let me know if that works for you now!
@Sindigo-ic6xq5 ай бұрын
although yes reasoning should absolutely be in
@PatrickStorm_5 ай бұрын
Agreed! I’ll make sure to add reasoning into future comparisons. Thanks for the feedback :)
@Sindigo-ic6xq5 ай бұрын
@@PatrickStorm_ math, iq problems (nonvisual) and understanding of the world
@eburgwedel5 ай бұрын
Could have been a good comparison, but wasn’t - image gen and facts make absolutely no sense; reasoning was missing entirely. It did point out a few important things, though, so thank you.
@PatrickStorm_5 ай бұрын
I appreciate the feedback! I hear you about the reasoning not being in there. That was a miss, future comparison videos will definitely have that section… and probably won’t have the facts section. Glad you got some stuff out of the video though!
@barafwal2533 ай бұрын
Hi, I need to subscribe for one of the paid version of these AI chatbots (Claude3.5 sonnet, chatgpt 4o etc.) for the coding purposes mainly. I need to frequently uploading files, images, and sometimes referring to the web links. I have huge length of codes to analyze and other files. Will it be exactly same if I directly subscribe to chatgpt 4o from openai or subscribe to perplexity and use the chatgpt 4o AI model in the setting, similar cases for other AI models too? In case of perplexity, I will be getting multiple AI models in just one plan, is it really true and practicable?
@_fuji_studio_3 ай бұрын
claude 3.5 sonnet is better than any version of gpt. i always use it as my coding assistance the moment i discovered it, very amazing, can understand very complex code and the code run without error
@nicholasfabris1304 ай бұрын
In round 5 UK was the correct answer
@vuhoang59034 ай бұрын
My experience is GPT is still better than Claude in logic, reasoning and problem solving (for coding, math, data analysis,...)
@SharvindRao4 ай бұрын
Blah blah blah
@johnick4514 ай бұрын
Please give us a reasoning engine use case. Is it practical, usable?
@PatrickStorm_3 ай бұрын
I'll add that into future comparison videos. That is a really tough question to answer without knowing the exact use case. They can definitely reason, but it greatly depends on how accurate you need the results to be.
@Monawwar4 ай бұрын
10:31 - what's that extension 😆
@Earth2Ross26 күн бұрын
why does a tie get zero for both models, just saying, if they both did well, don't they both get a point?
@fast45494 ай бұрын
You can actually tell Claude to generate and image and it will make an SVG
@farleylai11024 ай бұрын
Find yet another LLM to judge the two output responses.
@PatrickStorm_3 ай бұрын
I know you're joking, but that is how some of the benchmarks actually work! At my last company, I even built a test suite that did just that. There is a curious feature of LLMs that they are better judges than they are creators, so it actually sort of works. The future is weird.
@Zealotux4 ай бұрын
I've tried both for non-trivial coding and Claude is MUCH better at complex tasks, GPT-4o didn't stand a chance.
@hanfo4204 ай бұрын
Claude didn‘t even get that it was Obama.
@PatrickStorm_4 ай бұрын
Lol I missed that. That should have been a negative point even!