exactly, no need for 5 phd's 2 or 3 and a bit of genius is absolutely sufficient^^ great work. your amazing :wink:
@diga46964 күн бұрын
I have started a small local nonprofit that uses csi concepts and agentic workflows to help evolve local communities and businesses; objective is reward driven knowledge sharing for optimization of processes and task allocation of internal and external operations.
@s4kupfers3 күн бұрын
Really fascinating, I'm looking forward to you sharing some of your real world findings and learnings.
@OumarDicko-c5i5 күн бұрын
If you do implementation i will love to see it, so beautifull
@Summersault6665 күн бұрын
One thing i didn't understand. After you generate the "reasoning" text how do you incorporate the knowledge?
@thingX1x5 күн бұрын
I love learning these new ideas, I can just ask bolt iif it can add in the new concept :D
@tk01505 күн бұрын
Please share your experience after you play with it!
@tspisКүн бұрын
Very cool stuff - thanks for sharing and covering, love your content, as usual! However, what is a bit unfortunate is the paper authors are framing an iterative heuristic as a gradient-based optimization method. They use optimization equations when no optimization calculation is actually happening. So while this doesn't at all diminish their results and achievements, it kind of leaves a bad taste in my mouth. And just a week later after the TPO paper's release, another one that does heuristic refinement (though this time, prior to inference) but tries to pass it on as backdrop/differentiation. Though this later one is an even worse offender, as it goes a step further and actually uses differentiation equations, when no backdrop or optimization calculations is being performed ("LLM-AutoDiff: Auto-Differentiate Any LLM Workflow", 2501.16673). Again, very cool work, and the results are there - but why the math-washing? Sigh.
@Karthikprath4 күн бұрын
Thanks for this video.Can you tell me the formula for calaculating FLOPS during inference on H100?
@virendraashiwal83112 күн бұрын
But domain specific reasoning can be done here? Do we need domain specific reward function?
@profcelsofontes5 күн бұрын
And how about GPRO used by Deepseek R1
@FelheartX5 күн бұрын
The reward model in this case is just "this answer got picked by more people than the other answer(s)" or what? But how does this help in some chat system like ChatGPT? The LLM internally then does all this "text loss" and "text gradient" stuff, and then what? The next response to the next message it gives will then be better adjusted to the users preference? Essentially this is an elaborate way of "this is the answer the user picked, now lets try to infer how they prefer their answers and continue doing that" or am I wrong?
@user-qw1rx1dq6n5 күн бұрын
Fuck I was working on this for like a year and a half now I was so close to getting it to work
@TheDoomerBlox5 күн бұрын
insert obligatory derogatory statement here Well that puts you into prime position to reimplement something similar for different purposes, no?
@user-qw1rx1dq6n5 күн бұрын
@ yeah I’m gonna go take a look at how they implemented the reward model maybe that can solve my problems