I gave three AI models a CSS quiz

Рет қаралды 14,289

Kevin Powell

Күн бұрын

Пікірлер: 78

@only_._gaming 2 ай бұрын

This is a great video, and can be made into a series of sort.

@tgd-613 2 ай бұрын

Great Video. I NEVER trust AI for coding questions. I do use them to point me in a direction where I can search elsewhere to find an answer to a tough question, but I don't ever trust exactly what they give me.

@tgd-613 2 ай бұрын

... and cheers from Ottawa, from a NS Acadian!

@spaceowl5957 2 ай бұрын

That's how I treat it as well, o1 does seem to be quite a bit more accurate though.

@kingoffongpei 2 ай бұрын

I haven't tried any of these other AIs, but I used llama on a whim to help me learn D3.js and it was immensely helpful. The docs were really hard to find information with and not very good at breaking down how it worked. Asking llama (idr if it was 70B or 450B) questions about things I just couldn't figure out really helped take some of the pain out of the process. I could ask questions about the answers it provided and it felt a lot like a conversation with someone who knew and understood D3. There were a few instances where the solutions it provided didn't seem to work but when I pointed it out, it did fix it. I thought that was pretty understandable since I asked it questions about a project that it never saw, only basing it on the context I provided in my prompt, and D3 has changed a lot over the years so it may have based some answers on stuff on the internet written about previous versions.

@mkLee970 2 ай бұрын

On Question 6 ~17:40 seconds in Claude made up an answer. It select "B) 1.5rem + 1.5vmin" but in the question answer B is "B) 1.5rem + 1vmin" . They are not the same. The AI made up a new answer how fun is that.

@FenrirRobu 2 ай бұрын

It's even funnier because AI then goes on to explain that answer B is incorrect because it is 1.5rem + 1vmin

@HITO-nv4cg 2 ай бұрын

The best free AD for claud 😃

@321sas 2 ай бұрын

At 7:45 you got the question wrong. You said, "as long as my selector has equal specificity, it will not work" but the text said "As long as my selector has equal specificity, it will work." That's why Gemini did it so terribly and flipped the specificity because you told it something that was not true and it thought that equal specificity was needed.

@msclrhd 2 ай бұрын

What's interesting with claude is that one of its answers here said something like "as defined in the CSS specifications". I wonder if that means that it was trained using the CSS specs as one of its data sources. That would also explain why it nailed things like the unit definitions.

@chriswalker4636 2 ай бұрын

Kevin thank you for producing the very educational content you publish. I have been a subscriber for sometime but this is my first comment. I am very interested in learning more relative to large design models (LDMs). Your expertise in reviewing LDMs would be very interesting and worthwhile.🎉

@clevermissfox 2 ай бұрын

Yay, KPow had mentioned this video on his discord awhile ago and I’ve been [im]patiently waiting 😂 loving the long format again and this is a very interesting comparison! I’ve experimented with copilot, chatGPT and Claude in terms of JavaScript refactoring not as much css and in my tests Claude came out on top; copilot and chatgpt was dismal and laughable. To be fair , a lot of this was in February or March and these things change and learn so quickly , the same experiments would most likely glean very different results now in September 😂

@Nova_BG 2 ай бұрын

Always Amazing Videos !

@SkamanSamTyler 2 ай бұрын

I really like this! I keep telling my friends not to trust the AI stuff, but some of them insist on "learning" by asking the LLMs. I have always preferreds the source materials - W3C and MDN are still the greatest resources on the web. That said, I wonder how Codeium stacks up? (I do use AI assistants in my edditors, but I always double-triple check their code. Remember: never use someone else's code without understanding yourself!)

@KevinPowell 2 ай бұрын

Codium uses GPT-4o, which is the same as copilot. Not sure if the paid version uses something else though

@Dunc4n1d4h0 2 ай бұрын

Great stuff. I use it most often as a syntax reminder or template generator for programming. Although sometimes they work okay, sometimes when I use GPT or Copilot they either don't understand the question at all or have to be guided in the right direction. IMO in the case of css these differences result from when the model training was completed. For example, you can see a big difference between gpt4 and 4o. By the way, it's time for me to check Claude.

@franckdervaux792 2 ай бұрын

Thank you for this! It gave me the idea to ask Claude about a technical question (not css but docker related) and it nailed it on the first try! That was not experience previously with Copilot …

@anthonybarnes 2 ай бұрын

See this is a great topic to make a video about - hitting multiple relevant modern technologies at once 👏

@compton8301 2 ай бұрын

Wow, your knowledge is remarkable!

@sandy_knight 2 ай бұрын

24:24 My guess why Copilot got this wrong is, like screen readers used by the visually impaired, it read INSIDE as an initialisation and didn't understand its meaning. You should have used text-transform:uppercase; PEBKAC 😜

@SavanSanandiya-p5y 2 ай бұрын

Kevin, This is a great video Make a series on PDF development and email design.

@scragar 2 ай бұрын

17:40 Not sure if you should count that as correct. It said B, which was wrong, but it also rewrote the answer text to be correct(adding .5 to the vmin). IMO that should be half a point, inventing an answer not in the options is basically cheating and if it wasn't for you knowing the answers going into the test it could very easily have convinced you D was wrong and B was right using the explanation as to why D was right and B was wrong.

@joshuamitchell6204 2 ай бұрын

Custom defined units would be kinda cool...didn't know I needed that until now 😂

@viccc.n 2 ай бұрын

You should try with OpenAI o1, it takes more time for "thinking" before answering

@CodingwithNephi-c6r 2 ай бұрын

You accidentally gave a point to Copilot instead of Gemini on question 9 :)

@myartikool 2 ай бұрын

Judging by the overall performance of both, I don't think that matters that much :}

@VaebnKenh 2 ай бұрын

Copilot is based on an older GPT model, would be good to also compare against the latest ChatGPT 4o. Also: LLMs can get confused by large amounts of extraneous chat context. It's better to ask new, unrelated questions in a new chat.

@PeterWarholm 2 ай бұрын

Great video Kevin! Recently read about Claude 3.5 Sonnet had shipped with a lot of coding (and math) in mind. Would love to the a rematch when/if the others make an update. Regarding Q, afaik it is not really part of the metric system but seems to be a unique unit for the web?

@icepuddin168 2 ай бұрын

what font is he using at 00:03 ??

@albedesigns 2 ай бұрын

I have never clicked on a notification so fast 😂 Great topic for a video!

@KevinPowell 2 ай бұрын

Glad to hear that! Was very curious if people would be into this type of thing or not!

@다루루 2 ай бұрын

드디어 이 채널에도 AI model 이 나왔다!! 저도 Claude가 좋다고 느꼈습니다! Finally, an AI model has appeared on this channel!! I also felt that Claude was good!

@daveturnbull7221 2 ай бұрын

Vertical Media is defined by wikipedia as Trade Magazines/Journals.

@samhenrigold 2 ай бұрын

The funny thing with these models is that they’re always, like, two years behind on web technologies. Like I cannot for the life of me get any LLM to touch subgrid with a 10 foot pole

@mendoso 2 ай бұрын

A lot of these fails come from not being strict enough, which in normal case would be correct approach. But in the world of software development one should differ a typo from some weird token which can be a css unit or a part of new specs. Funny how they can pretend that 1vm is.the same as 1.5vm but at the same tiime do not even bother to ignore case when it comes to the unit of q/Q. You could probably require them to perform strict syntax checking but then you should also avoid errors like omitting NOT in the Wordpress question.

@SianDoherty 2 ай бұрын

Copilot is only trained to December 2023 I read today and that's gpt 4o or whatever it's called. Not sure which chatgpt copilot is based on but it's not that great. Claude has definitely been the best for coding since I've been using it - at least for python but it seems like for CSS too

@TheThirdWorldCitizen 2 ай бұрын

Would the media query behavior change when using nesting?

@weisj 2 ай бұрын

Gemini got the first questions wrong. It says that the rem in the media query would be relative to the html font size. In particular it says that 1rem would be 32px in this case.

@nustaniel 2 ай бұрын

Oh the struggles of trying to get LLMs to do anything correct in CSS. Not only is it terrible at CSS shapes and such, which is basically just math, but it keeps using rgba(0, 0, 0, 0.5) instead of just rgb(0 0 0 / 0.5) and other outdated approaches to things. I have completely given up on asking an LLM for CSS things outside of "Is there a way to do.." and using the response to figure it out myself.

@ElectricKota 2 ай бұрын

Thx, i acrually try only severtimes some simple task for CSS, and it got wrong in every time, so I stop using it for CSS, full of nonsense. For JS it looks like it can be more corret, however, it needs to be checkc as well. i like using this tools for stupid questions, it can help to find correct learning materieals.

@darwinmanalo5436 2 ай бұрын

Claude it is. ❤ Can you publish those questions so we can try it?

@karthicc7298 2 ай бұрын

@kevin powel, I have been working as a css developer since 12 years. Recently my company laid off me due to insufficient projects. Iam actively looking for a Role where I can fit. Kindly help me.

@TimeFlyBy 2 ай бұрын

Is there a particular reason you didn't include the very popular chat-gpt? (Sorry I just kind of watched only the first 5 minutes due to time constraint...)

@denisds130 2 ай бұрын

I'm curious how Perplexity would do in this challenge.

@xilliman 2 ай бұрын

I love AI but when it comes to coding, it has lots of flaws. I just asked ChatGPT about how to display a unit inside an input but it got it absolutely wrong. Even after I told ChatGPT, that it was wrong, the answer was still wrong :) It’s great for asking short questions like: what’s the + selector doing? The question would be: can an AI (in the future) understand new CSS features and apply them correctly?

@KevinPowell 2 ай бұрын

The problem with CSS specificially is that nothing really lives in isolation. Context is key, but even giving it access to your entire project, I don't see a time where it'll properly infer all the different things that are going on. Like you said, simple things it might be able to explain, but I mean, two of the three had no idea how specificity of simple selectors worked, so I have my doubts there as well, lol.

@andreilucasgoncalves1416 2 ай бұрын

@@KevinPowellYeah, and because of that it makes LLMs be really bad with CSS in general, but a little bit good with Tailwind because it is more isolated

@nomadshiba 2 ай бұрын

also html isnt the root of everything, its the document instead of html i could have just used svg or mathml

@Dekutard 2 ай бұрын

copilot and gemini? why not chatgpt and llama?

@KevinPowell 2 ай бұрын

Copilot uses the same model as chat gpt. As for llama, I could have... Maybe next time?

@Dekutard 2 ай бұрын

@@KevinPowell i feel like the responses from copilot are different though too, idk what microsoftness they add to it. i would assume raw chatgpt would be more optimal but i could be wrong. and claude! claude has to be a contender too. i never hear about gemini or copilot for coding 🌚 js

@kaslmineer7999 2 ай бұрын

Oh i clciked on the video after 1 min of its publish

@kaslmineer7999 2 ай бұрын

Cool kevin powell gave me a heart on my comment :)

@shyamfx 2 ай бұрын

Wow

@webschool4780 2 ай бұрын

4 one

@DampeS8N 2 ай бұрын

At about 3 minutes you use the phrase "Maybe it knows how it works but got the explanation wrong." This is a mental mistake. LLMs don't know how anything _works._ That isn't how the function. Your earlier assessment that they regurgitate the general consensus of the internet is more correct. But even then, that's not how they work. They are big autocompletes. They merely predict the most likely next words, right or wrong. Nonsense or sense. And they are not consistent. Every time I see people testing LLMs like you are, I see them asking _once_ and that's not how these tests should work. That's not how you should use LLMs, either. You should be asking multiple times because it WILL give different answers each time.

@Killyspudful 2 ай бұрын

Absolutely agree.

@AntiAtheismIsUnstoppable 2 ай бұрын

I always hate it when people use words like _intelligence_ and _think_ about AIs. They're advanced calculators, nothing else. The problem here seems to be, that because they're too advanced algorithms to understand for most humans, then it must be conscienct. No. A calculator is a calculator, no matter if it solves 2+2 or advanced algerbra. I could also say, well it is magic to me how the calculator calculates 2+2 and gets 4, therefore the calculator must be conscient.

@rujor 2 ай бұрын

After KZbins horrific price increase, many people are talking about leaving the platform. Please share if your content is available somewhere else--otherwise, I'm going to miss the channel very much ❤️!

@bob-p7x6j 2 ай бұрын

I'm confused, are you paying for this?

@chychywoohoo 2 ай бұрын

It's not "clode" lol

@fatema8eee 2 ай бұрын

32*35

@a1white 2 ай бұрын

I'll stick to W3C Schools and Stack Overflow

@svivian 21 күн бұрын

LOL w3schools is one of the worst sites out there. It’s probably where these AI bots got all their wrong answers from!

@DxBang3D 2 ай бұрын

!important is !correct... in most programming languages, putting an exclamation mark in front makes it a NOT operator...

@daedaluxe 2 ай бұрын

I can't tell if you're trying to tell us that important isn't correct or isn't incorrect

@DxBang3D 2 ай бұрын

@@daedaluxe It is really !important what I am trying to say. ;)

@daedaluxe 2 ай бұрын

@@DxBang3D It's not important, got it

@KevinPowell 2 ай бұрын

It's one of the several mistakes the working group has listed, but can't change it now 😊

@st8113 2 ай бұрын

openai leaping in to invalidate this video with the new model mere days before release

@andreilucasgoncalves1416 2 ай бұрын

O1 probably would get most of them correct

@5alidshammout 2 ай бұрын

discord communities ftw

@ibrahimharchiche3590 2 ай бұрын

I dont have time to watch the video, but i just want to say that ai is terrible at css because it's so visual and implicit unlike programming languages which are based on logic.