Open Reasoning vs OpenAI

  Рет қаралды 29,783

Sam Witteveen

Sam Witteveen

Күн бұрын

Пікірлер: 73
@UCs6ktlulE5BEeb3vBBOu6DQ
@UCs6ktlulE5BEeb3vBBOu6DQ 15 күн бұрын
Devs were very clear that they shared QwQ with us so we see the progress but its just the basic mechanic of whats to come.
@toadlguy
@toadlguy 15 күн бұрын
It is very disappointing that OpenAI doesn’t even say in a paper exactly what they are doing in o1. I am sure it is a variety of techniques, some of which are being deployed in these open models. They no longer give any credence to their argument that this is because of safety. In fact, all the efforts they have put into preventing “jailbreaking” is done to avoid seeing the raw tokens (that you pay for, btw) because it would give an idea of what they are actually doing. I’m sure there are some interesting ideas there, but this idea of siloing science for competitive reasons is so far from where (they at least said) they came from, it is pretty repugnant.
@MudroZvon
@MudroZvon 12 күн бұрын
The argument that secrecy ensures AI safety is flawed. While preventing malicious use is crucial, relying solely on internal processes to identify and fix vulnerabilities is insufficient. Independent audits and robust "red teaming" offer a more effective and accountable approach to safety, allowing for external scrutiny without compromising core algorithms. The current model of AI development creates a significant power imbalance. Users pay for access to powerful tools without understanding how they work. This lack of transparency undermines trust and accountability. In situations where AI provides inaccurate or harmful information, tracing the source of the error becomes nearly impossible, hindering rectification and accountability. The secrecy surrounding AI development contradicts the fundamental principles of scientific progress. The "siloing" of knowledge not only limits competition but also slows down the overall advancement of the field. Furthermore, this opacity makes it difficult to identify and mitigate biases within AI models, potentially exacerbating existing societal inequalities. The legal framework surrounding AI intellectual property is still evolving. Finding the right balance between protecting trade secrets and promoting transparency is crucial. Compliance with data privacy regulations like GDPR is essential, and a more nuanced legal approach may be needed to address the unique challenges of AI development while fostering collaboration. A tiered approach to transparency offers a practical solution. Sharing high-level information about model architectures and training methods, while protecting specific algorithms, allows for broader participation in the development process. This fosters collaboration, accelerates innovation, and enhances the overall robustness and safety of AI systems.
@user-pt1kj5uw3b
@user-pt1kj5uw3b 10 күн бұрын
Yeah its extremely deceptive. They are effectively training their AI to lie to us. One thing I have noticed in using QwQ and reading through the thought process is that the thoughts themselves are very important, and being able to stop and edit the raw thoughts is extremely powerful.
@moormanjean5636
@moormanjean5636 9 күн бұрын
Open AI is disgusting, Sam Altman is a toad
@DarrenReidAu
@DarrenReidAu 15 күн бұрын
Great video, the broken loop reminded me of the “There are 4 lights” Picard meme, which then made me realize the episode that is from is called “Chain of Command” 😂
@tornyu
@tornyu 15 күн бұрын
Interesting that the QwQ and R1 models use similar expressions in their thought processes, like "Wait a minute, letters can be tricky, especially if there are repeating letters". I wonder why? See 11:57 and 18:35
@tantzer6113
@tantzer6113 15 күн бұрын
My guess: the approach that works is to generate multiple attempted solutions (conjectures) and then evaluate them (refutations) and pick one. Now, the first step, the generation of multiple candidates, can take place by doubting whatever you have done so far. Self-doubt is key in critical thinking.
@tornyu
@tornyu 15 күн бұрын
@tantzer6113 agreed, but even so the specific expressions used are surprisingly similar. Maybe they trained on a common dataset? Or maybe this is the kind of thing you get if you use GPT-4 to generate synthetic data.
@五爱热米
@五爱热米 14 күн бұрын
Chinese don't cock block each other I guess
@TomGally
@TomGally 15 күн бұрын
Thanks for the video. Very timely. One thing I find very interesting about the exposed chains of thought is that they enable us to see where the reasoning might have gone wrong. Over on AI Explained, Philip has developed a set of common-sense reasoning problems that humans do much better on than any current AI models. When I tried a few of his publicly available prompts with DeepSeek, the model did not get the “right” answer, but I could see that it had decent reasons for coming up with a “wrong”answer. The exposed reasoning thus helped to reveal ambiguities and other flaws in the reasoning problems themselves. I imagine that careful examination of those chains of thought, both by humans and by AI, will also be a very useful way to improve the reasoning ability of these models.
@kunwar_divyanshu
@kunwar_divyanshu 15 күн бұрын
back in december ?
@ildaryakupov903
@ildaryakupov903 15 күн бұрын
😂
@ivarborthen7320
@ivarborthen7320 15 күн бұрын
Dude, were in the future now
@derekw6811
@derekw6811 15 күн бұрын
What local front end are you using with Qwen coder at 16:25 ?
@ronnetgrazer362
@ronnetgrazer362 15 күн бұрын
Look ma, no moat!
@rascalwind
@rascalwind 15 күн бұрын
Question for you. when you're doing your strawberry test are you using strings or string literals. a "strrawberry" might be interpreted as something that the AI should spell check and use a dictionary for where a 'strrawberrrrry' might be something that it takes as a literal and attempt the task differently?
@grabani
@grabani 15 күн бұрын
What are your thoughts why anthropic has not released a similar chain of thought model or architecture?
@TomGally
@TomGally 15 күн бұрын
Are we sure they haven’t? Lately, when I’m using Claude, it will sometimes pause for a while and give a “Thinking…“ message before responding five or 10 seconds later with the answer. It might be going through a multistep chain of thought, though none of it is disclosed. I wonder if that’s why the latest version of Claude scores close to o1 on some of the reasoning benchmarks.
@elawchess
@elawchess 14 күн бұрын
@@TomGally I hope for Anthropic's sake that they haven't deployed a thinking model secretely. Because that would mean it's not good.
@elawchess
@elawchess 14 күн бұрын
@@TomGally I think it's plausible that they've baked in chain of thought reasoning, what they probably haven't released yet is the monte carlo search that consumes lots of tokens. If they did that without increasaing the price of their API they would probably lose a lot of money. That's how to reliably infer they didn't secretly deploy such a model.
@niter43
@niter43 11 күн бұрын
​@@TomGally like 5 months ago they've added hidden tag to the output. Though it's not actually used for chain of thought, but just for model to evaluate whether it's necessary to add an artifact (code block, rich text, HTML/SVG with preview, diagram, etc) to the response.
@Charles-Darwin
@Charles-Darwin 15 күн бұрын
Has anyone else suspected the whole 'strawberry' thing comes from the shape of the monte carlo tree graph? It would be painfully on the nose if this were true... Maybe red for filtering by logical-nots/falsey values that consists of the initial nodes, then green for the truthy final leafs. A lot of our own reasoning is first stating what x definitely is-not, then we pick from likely candidates that remain - if we really don't know something.
@lucasjans
@lucasjans 15 күн бұрын
The speed of open source development is promising, but the traditional accuracy benchmarks hide the importance of speed which is more critical for these inference-bound models. Nice video highlighting the current ecosystem.
@NowayJose14
@NowayJose14 11 күн бұрын
The progress on these models is insane, people have no clue what's coming.
@RaitisPetrovs-nb9kz
@RaitisPetrovs-nb9kz 15 күн бұрын
We just want to know if qwen still looping Reminds Asimovs story of SPD-13
@MeinDeutschkurs
@MeinDeutschkurs 13 күн бұрын
Sam, I can’t help myself, but it seems to work better if I dynamically prompt exactly, what I need. The first iteration determines the “reply mode/format” and the second iteration brings the reply. The agentic “flow” is way cheaper as well.
@mkstowegnv
@mkstowegnv 12 күн бұрын
Please give an example
@MeinDeutschkurs
@MeinDeutschkurs 12 күн бұрын
@@mkstowegnv I already gave an example. See the workflow the first iteration and the second iteration.
@bokuboke482
@bokuboke482 14 күн бұрын
Cool comparison! However, there's no way I'm paying for overseas reasoning models when homegrown Open AI's superb o1 exists. Which I do pay for! And it's impressive even in its pre-release versions (mini & preview).
@stevenpham6734
@stevenpham6734 10 күн бұрын
Joke right?
@menglilingsha
@menglilingsha 14 күн бұрын
My question is : I have downloaded the open source models, but without their secret prompt, the accuracy is no comparison to openAI models, even not as good as a regular Llama 3.1. 70b instruct model. Can any one tell me where is the secret prompt? Using deepseek or qwq website is no open source at all. No one knows what model is running in the backend
@geelws8880
@geelws8880 12 күн бұрын
“The secret” prompt is the training… you basically train it with prompt example to already correct answers. By giving it examples… I think GPT 3.5 or 4 (I don’t remember) took like 100k prompt examples at inicial training lol
@devilsolution9781
@devilsolution9781 11 күн бұрын
​@@geelws8880you manually have to do the rlhf?
@undefined6512
@undefined6512 14 күн бұрын
You decide to break up with your AI waifu. AI: QwQ what's this???
@jtjames79
@jtjames79 14 күн бұрын
Need an agent to assign inference time per query.
@marshall-b8i
@marshall-b8i 13 күн бұрын
Great content, as always! A bit off-topic, but I wanted to ask: My OKX wallet holds some USDT, and I have the seed phrase. (alarm fetch churn bridge exercise tape speak race clerk couch crater letter). Could you explain how to move them to Binance?
@pallharaldsson9015
@pallharaldsson9015 10 күн бұрын
11:54 How many r's in "strrawberry" (typos intentional): "thought for 9 seconds"... This is progress in AI :) I so wish for tokenization to go away, then at least his would be a nonissue.
@hqcart1
@hqcart1 15 күн бұрын
do we now get impressed when the model count R's in stawberrrry?
@DrHanes
@DrHanes 14 күн бұрын
Yes, it is impressive from a technical perspective because it demonstrates advancements in step-by-step reasoning capabilities in language models. Moreover, those who question its significance may lack understanding of how LLMs function and could benefit from further education on the topic ;-)
7 күн бұрын
Excellent! Thanks
@tantzer6113
@tantzer6113 15 күн бұрын
I just called the mental asylum to check on QwQ. They said it is still looking straight ahead with that distant gaze, rocking back and forth, and muttering, “I should finalize my answer, I should stop, I should go with my first result, I should end, …”
@idck5531
@idck5531 15 күн бұрын
Next big thing is test time training, wonder will those open source models will be out.
@AmrAbdeen
@AmrAbdeen 10 күн бұрын
poor qwq. you broke the little guy?
@ZenBen_the_Elder
@ZenBen_the_Elder 10 күн бұрын
'Chinese AI models like QwenQ are keeping pace with American AI enterprise models' [16:04-17:46]
@Bill-bc5bg
@Bill-bc5bg 14 күн бұрын
Am I the only one that thinks these models closely followed open ai releases because they are built using intel that has been lifted from openAI ?
@Dackel1972
@Dackel1972 14 күн бұрын
qwen is sure to develop a depression with that amount of overthinking
@cookiesInChocolate
@cookiesInChocolate 15 күн бұрын
What is the point of “strawberry”-like question if we know that LLM doesn’t recognize letters? How model suppose to count those letters?
@jasonfilby9648
@jasonfilby9648 15 күн бұрын
Some LLM-based systems could write and run basic Python programs to perform the logic. But by now the answer to the "count the r's in strawberry" question is likely in the training data.
@orterves
@orterves 15 күн бұрын
Well, the chain of thought did solve it despite the challenges of tokenisation
@cookiesInChocolate
@cookiesInChocolate 15 күн бұрын
@@jasonfilby9648 So, LLMs should be aware of their own limitations and write programs to count things instead of 'guessing.'? I guess it's similar to spoken words for people-we can't really count letters unless the word is written, and we can point to each letter and count.
@jasonfilby9648
@jasonfilby9648 15 күн бұрын
@@cookiesInChocolate It's possible, not sure what's going on behind the scenes though.
@cappiels
@cappiels 15 күн бұрын
there is not a word "strrawberry" so the correct word you were reasonably referring to has 3 'r's
@elawchess
@elawchess 14 күн бұрын
I thought that as well. In a way it's ambiguous to ask that because typically LLMs correct mistakes. If you ask "true or false my meighbor means someone who is next to me" it would usually take meighbor to be neighbor that contained a typo and theh answer. Perhaps where the model could have improved is to make it clear that it corrected a typo before answering and not just say "3". Could have said "There are 3 answers in strawberry" and then it would be clear it corrected it before answering. Of course a better answer might be to give an answer for both cases and state those two cases clearly. you don't want a model just saying "4" if it could have been a typo. Perhaps only if it was constrained to a one word answer.
@jasonrhtx
@jasonrhtx 14 күн бұрын
He specified the word (literal string) in quotes, ‘strrawberry’, and did not request autocorrect.
@elawchess
@elawchess 14 күн бұрын
@@jasonrhtx It corrects without looking for a request for autocorrect. I don't think that that would be a reasonable way for an LLM to behave in general. When you mistype a word unless you request autocorrect it should just start complaining about the spelling errors in your work and not answer? What that's good for are these trick questions that have little bearing on the LLM's practical use.
@hrmanager6883
@hrmanager6883 15 күн бұрын
Great job 👏
@dadamaldasstuff1816
@dadamaldasstuff1816 10 күн бұрын
The models will start to learn that strrawberry has 4 r's now.
@henriqueabreu4436
@henriqueabreu4436 15 күн бұрын
Athene V2 looks good
@mircorichter1375
@mircorichter1375 15 күн бұрын
Has anyone tries test time training on these new Large Reasoning models to see how much that even improvese them further?
@menglilingsha
@menglilingsha 14 күн бұрын
TTT is just LoRA, nothing special
@NostraDavid2
@NostraDavid2 10 күн бұрын
​@@menglilingshano? TTT is used during runtime, LoRa during training.
@Pxi145
@Pxi145 10 күн бұрын
Do We really Need Advanced, modèle just to Count, the number of r in strawberry ? Like i just prompted this :Breakdown the Word anticonstitutionnellement in token like form for each letter. After that circle each letter -n- and for each circle count one. The number of circle is the number of n in the word And it worked !!! What an amazing answer 🤡 People need to stop comparing advanced models with dumb prompting
@五爱热米
@五爱热米 14 күн бұрын
Interesting they all Chinese company.
@GrowStackAi
@GrowStackAi 11 күн бұрын
If innovation had a face, it’d look like AI 💫
@thiagomaia8902
@thiagomaia8902 14 күн бұрын
Dude, O1 models were not released in December…… 😂 It’s much more recent
@NostraDavid2
@NostraDavid2 10 күн бұрын
o1 (non-preview) is going to be released this month, so he's not wrong. Accidentally not wrong, but not wrong nonetheless.
@michaeltse321
@michaeltse321 15 күн бұрын
llms cannot solve problems it has never been trained on. it gives a best guess.
@reza2kn
@reza2kn 14 күн бұрын
I felt bad for the model that was stuck in the loop! 🥲I've been stuck in such loops myself and it is not fun.
@xlretard
@xlretard 15 күн бұрын
neti neti
@jsward17
@jsward17 15 күн бұрын
Back in December? What?
Explaining OpenAI's o1 Reasoning Models
27:18
Sam Witteveen
Рет қаралды 16 М.
Anthropic's New Agent Protocol!
15:35
Sam Witteveen
Рет қаралды 41 М.
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 152 МЛН
99.9% IMPOSSIBLE
00:24
STORROR
Рет қаралды 27 МЛН
React 19 is finally out!
28:01
Theo - t3․gg
Рет қаралды 60 М.
AI can't cross this line and we don't know why.
24:07
Welch Labs
Рет қаралды 1,4 МЛН
MAGNUS BEATS 8 CHESS.COM EMPLOYEES IN CHESS AT ONCE!
28:55
Chess.com
Рет қаралды 376 М.
My "Secret" Project
10:08
Chris Titus Tech
Рет қаралды 79 М.
AI is not Designed for You
8:29
No Boilerplate
Рет қаралды 186 М.
PydanticAI - The NEW Agent Builder on the Block
21:45
Sam Witteveen
Рет қаралды 26 М.
DeepSeek R1 vs O1: Which REASONING MODEL Reigns Supreme?
29:45
Chris Hay
Рет қаралды 1,7 М.
Beat Ronaldo, Win $1,000,000
22:45
MrBeast
Рет қаралды 152 МЛН