What I would personnaly like to see is testing the AIs with real professional use cases, using same long and detailled prompts. The mail test seems the closest thing to that, but I had in mine things like a full website, a tiny e-commerce store, a food delivery app, a set up of multiples docker images organized with Swarm or Kubernetes… The benchmarks with standardized prompts like to-do app or tetris Game are interesting to compare the theorical intelligence, but I can’t prevent myself from thinking that beside benchmarking for the sake of it, the use cases that we see in general (not only this video) are not meant for the professional world. But that’s just my personnal opinion.
@PhonognomiksКүн бұрын
Exactly
@andreinikiforov26713 күн бұрын
You are doing cutting-edge content, as always! By the way, 93.7% is 123 IQ points -pretty good!
@John-il4mp2 күн бұрын
It is 127 the real number ;) even better.
@John-il4mp2 күн бұрын
Being in the top 6.3% corresponds to about the 93.7th percentile. Using a normal distribution table or calculator, this percentile roughly aligns with an IQ score of 127.
@nathank51402 күн бұрын
Amazing. Love to see someone showing what’s possible.
@herramientak2 күн бұрын
¿Cuál fue el precio total de las cuatro pruebas?
@rjackstheartofwealth615217 сағат бұрын
How much did it cost??????
@lyeln3 күн бұрын
"AI has no owner" Jokes asides impressive quality, thank you for sharing this experiment!
@sirrobinofloxley71563 күн бұрын
Amazing stuff, really nailed it there, though I'm surprised Firefox doesn't have a dark theme?
@godonholiday2 күн бұрын
You should test if it can pass the google ‘not a robot’ security tests were you have to select all the pictures of cars etc..
@derrelecteКүн бұрын
I am trying to experiment with Claude using your tutorials. I want Claude to create a video snippet for me. I'm trying to get Claude to download and install lightworks but I'm running into tons of issues. Do you have any advice?
@JNET_Reloaded2 күн бұрын
you didnt show the cost at the end :/
@fearhand3 күн бұрын
Couldn't something like Make or Zapier do something like this more efficiently through API calls? Or even have the agent itself use API calls instead of GUI web based interactions.
@Mookummockup3 күн бұрын
Yes but you don't have to deal with as many syntax issues this way. Probably less efficient if zapier etc can handle it but way more flexible
@Newsinrealestate2 күн бұрын
Are you using something ADDED on Claude??
@PhonognomiksКүн бұрын
“Computer Use”?
@JNET_Reloaded2 күн бұрын
good job its not outl;ook it would be doing spammy stuff rn lol
@dewijones923 күн бұрын
awesome
@noviceartisan2 күн бұрын
That percentile number = an IQ of 123
@noviceartisan2 күн бұрын
For you next challenge, use ONSHAPE (browser based 3d modelling program) to create a 3D model of some complexity ;)