Sorry Sam - gemini-exp-1121 !!!

Рет қаралды 5,824

1littlecoder

Күн бұрын

Пікірлер: 31

@1littlecoder 5 күн бұрын

New Video Generation AI model - kzbin.info/www/bejne/eajZf2V5mMmXnc0

@pixelperfectpravin 6 күн бұрын

I have noticed this - openai just trys to overshadow attention once Google does something

@pixelperfectpravin 6 күн бұрын

attention is all you need maybe

@therealyash.sharma 6 күн бұрын

always

@_Levi2589 4 күн бұрын

@@pixelperfectpravinlmao the reference

@theJatak 6 күн бұрын

I think with smaller context window for now, is for letting people know about them, let them use it, give them the feedback. And then later they'll make it another pro model, like gemini 2 Pro. So this could be a testing (as the name already suggests) model, ready to be commercial very soon. just After 1 week, coming of another model means they are speeding their work on them.

@MattRodriguez-h7j 6 күн бұрын

The more awesome thing is Google uses its own tpu chips. Sam and OpenAI will just burn the cashflow of msft in capex costs.

@idea_list 5 күн бұрын

Interestingly enough, Magnus is actually mentioned 3 times on that page (3rd time it's just 'Carlsen'). But, again, 3rd mention is not covered in the chunk of text copy&pasted by you in the prompt, it was later in sources. But I wonder if he was mentioned in that chunk as well the third time, just not by his name..? It should be possible to catch by llm probably. I'm too lazy to check though, so I'll just leave this thought here

@1littlecoder 5 күн бұрын

Woah. That's very interesting. Let me check it again.

@Shaunmcdonogh-shaunsurfing 6 күн бұрын

Really appreciate you covering this

@TheReferrer72 5 күн бұрын

Chatbot Arena must be horrible broken if Claude models are not top in coding.

@dkgrinder 6 күн бұрын

The McKinsey comment is hilarious and so true

@AarreLisakki-s5e 6 күн бұрын

That one result its not (tied for) leading on is not as you say "for style", its rather the leaderboard's attempt at _controlling_ for style, so that only substance counts -- so essentially exactly the reverse; its saying that when you discount idk human preferences for say greater length of response even when its not saying anything more or for way of its markdown use etc, gemini drops a rank to #2 in the overall category. Still pretty impressive ofc. This is how Chatbot Arena describes it in their blogpost about the criterion: " The goal here is to understand the effect of style vs substance on the Arena Score. Consider models A and B. Model A is great at producing code, factual and unbiased answers, etc., but it outputs short and terse responses. Model B is not so great on substance (e.g., correctness), but it outputs great markdown, and gives long, detailed, flowery responses. Which is better, model A, or model B? The answer is not one dimensional. Model A is better on substance, and Model B is better on style. Ideally, we would have a way of teasing apart this distinction: capturing how much of the model’s Arena Score is due to substance or style."

@Ann-yo5sb 6 күн бұрын

Regularly follow your channel, keep it up. But do your thumbnails have to always feature Elon Musk, and Sam Altman when talking about their business. There are a lot of hard working people behind them that are building the companies.

@Nick-h7f 5 күн бұрын

Your Fav Indian chess player, apart from vishy ?

@1littlecoder 5 күн бұрын

Arjun Erigasi - mostly at this point! but nothing is so solid tbh, I used to root a lot to Nepo (RUS) but I think his time for championships are gone! Wesly is another favorite player - very humble!

@twobob 6 күн бұрын

so, I could make a bot that runs my model locally and asks remotely and regardless of the "best" answer, simply choose the answer that matches my local output, and in this way skew the results to look like my model was best... And this is why clever people can't have nice things.

@Zbezt 6 күн бұрын

Gemini was "refined" into oblivion it has little to nothing to offer compared to other models i tried it myself and it was reduced to nothing more then a lazy persons prompt gadget sadly enough

@vaibhavgeek 6 күн бұрын

McKinsey employees - Mujhe kyu toda? 😂😂

@alx8439 5 күн бұрын

Gemini is leading by a margin of error, but nevertheless it is leading

@1voice4all 4 күн бұрын

Arena votes are not really a good way to assess models. It's subjective.

@AbuBakr1 6 күн бұрын

Many of the new models now where simply trained to pass benchmark questions; for example, the new qwen 2.5 model, (which was once a favorite for coding) passed all the benchmark question, you will think its better than anthrophic's claude but its a complete trash when used in real life 😅

@pritamadakofficial 5 күн бұрын

Absolutely right, 😂

@BrianMosleyUK 6 күн бұрын

Try this prompt... Watch it fail dismally. Only o1-preview comes close to success. Find pairs of words where: 1. The first and last letters of the first word are different from the first and last letters of the second word. For example, "TeacH" and "PeacE" are valid because: The first letters are "T" and "P" (different). The last letters are "H" and "E" (different). 2. The central sequence of letters in both words is identical and unbroken. For example, the central sequence in "TeacH" and "PeacE" is "eac". 3. The words should be meaningful and, where possible, evoke powerful, inspiring, or thought-provoking concepts. Focus on finding longer words for a more varied and extensive list. Examples 1. Banged Danger 2. Bated Gates 3. Beached Reaches 4. Belief Relied 5. Blamed Flames 6. Blamed Flamer 7. Blazed Glazer 8. Blended Slender 9. Bolted Jolter 10. Boned Toner 11. Braced Traces 12. Branded Grander 13. Braved Craves 14. Braved Graves 15. Braver Craved 16. Brushed Crusher 17. Busted Luster 18. Busted Muster 19. Causes Paused 20. Chased Phases 21. Chaser Phased 22. Cracked Tracker 23. Craved Graves 24. Crated Grates 25. Creamy Dreams 26. Created Greater 27. Dared Bares 28. Dancer Lanced 29. Dreamed Creamer 30. Fabled Tables 31. Faith Baits 32. Fallen Baller 33. Favoured Savourer 34. Famed Gamer 35. Famed Cameo 36. Fared Cares 37. Fasten Master 38. Fated Gates 39. Faved Caves 40. Feared Bearer 41. Fiery Piers 42. Fired Tires 43. Flared Glares 44. Flashed Clashes 45. Flipped Slipper 46. Foamed Roamer 47. Folded Bolder 48. Founder Sounded 49. Gifted Lifter 50. Gleaned Cleaner 51. Graced Traces 52. Hades Wader 53. Hardened Gardener 54. Hated Fates 55. Laced Racer 56. Laced Races 57. Lasted Faster 58. Leader Beaded 59. Leaves Heaved 60. Lighted Fighter 61. Lives Given 62. Manned Banner 63. Mailer Sailed 64. Mended Bender 65. Missed Kisses 66. Mounted Counter 67. Moved Lover 68. Named Games 69. Paced Laces 70. Paced Racer 71. Paced Races 72. Pained Gaines 73. Painted Fainter 74. Parched Marches 75. Placed Glaces 76. Plates Slated 77. Popes Roped 78. Races Faced 79. Racer Laced 80. Rarer Cares 81. Rated Dates 82. Raver Waves 83. Rested Tester 84. Saved Waver 85. Seated Beater 86. Sailer Wailed 87. Sainted Painter 88. Seeder Needed 89. Slayer Played 90. Tainted Painter 91. Tamed Games 92. Tailed Raider 93. Teach Peace 94. Tested Fester 95. Tinker Linked 96. Tired Siren 97. Traced Graces 98. Treated Greater 99. Warmed Farmer 100. Wasted Baster 101. Watched Catcher

@alvarobyrne 6 күн бұрын

attention is all you need

@Fatman305 6 күн бұрын

Gave it a shot. Sucked as usual. Was hoping deep thinking with web access will give it the edge, but it sure didn't... 4o (not o1) did a much better job...