Ground Truth #7: Live from AI Fest
28:54
Пікірлер
@TheUniversalAxiom
@TheUniversalAxiom 3 ай бұрын
It was nicknamed Strawberry because it's an organic model, not a static GPT. What's buried in the straw, is their new jam. 🍎
@JohnPDickerson
@JohnPDickerson 3 ай бұрын
Buried in the straw, or strraw, or strrraw, or ...
@TheUniversalAxiom
@TheUniversalAxiom 3 ай бұрын
@@JohnPDickerson It's funny to watch people test this preview model, because it's not actually a test of o1's reasoning, it's a test of the users' reasoning. Those with imagination who can process information in creative ways, will see how different this model is compared to all precedent. And those who learned to just memorize information see it as an auto-complete or harder to use. They ask it to count letters in a word LMAO
@chrisogonas
@chrisogonas 3 ай бұрын
I missed the live presentation, but this is just as good. Thanks for this incredible material. I am passionate about the disinformation and misinformation in the age of deluge of information, coupled with powerful tools at the disposal of everyday folks. It matters how these tools are deployed and how the unsuspecting vulnerable majority are protected from the potential harm. Thanks Cherie and Team!
@benny4013
@benny4013 3 ай бұрын
It was nothing about the Finance.
@Andrewthecommentator
@Andrewthecommentator 3 ай бұрын
Subscribed
@ohmkaark
@ohmkaark 6 ай бұрын
I was looking for a good summary around LLM evaluation metrics.. I see a lot of them captured here well
@brandiarakaki4071
@brandiarakaki4071 10 ай бұрын
💖 'Promo SM'
@ethanshub501
@ethanshub501 11 ай бұрын
Who is Diego M. Oppenheimer?
@rezgar482
@rezgar482 11 ай бұрын
Interesting
@bitcode_
@bitcode_ Жыл бұрын
Awesome talk! So much cool info
@vincentkaranja7062
@vincentkaranja7062 Жыл бұрын
Fantastic presentation, Max and Rowan! The depth of your analysis and the clarity with which you presented the complexities of evaluating LLMs is truly commendable. It's evident that a lot of thought and effort went into this research. I'm particularly intrigued by your approach to using LLMs as evaluators. It opens up a plethora of possibilities but also brings forth some ethical considerations. How do you account for systemic biases in evaluation metrics when using LLMs as evaluators? Given that traditional metrics might not capture the fairness aspect adequately, have you considered incorporating fairness metrics or mitigation methods in your evaluation process?
@vincentkaranja7062
@vincentkaranja7062 Жыл бұрын
Excellent overview, Terry. The part about identifying age discrimination within machine learning models caught my attention. Could you share more about how Arthur.AI's platform sets the acceptable range for performance metrics in this context? Is it customizable based on industry or legal standards?
@benwilson1952
@benwilson1952 Жыл бұрын
Any way to boost the audio on this? Barely audible in some parts
@chrisogonas
@chrisogonas Жыл бұрын
That's a resourceful conversation, folks. Thanks for hosting.
@autumn993
@autumn993 2 жыл бұрын
✋ þrðmð§m