The Alberta Plan for AI Research: Tea Time Talk with Richard S. Sutton

Рет қаралды 6,589

Amii

Күн бұрын

Пікірлер: 13

@borntobemild- 2 жыл бұрын

When you found the gizmo, it was a good metaphor on how you are freeing up the agent in the world with technology.

@Gabriel-oi6zb 2 жыл бұрын

Minute 11: Our interaction with the environment is not continual. There are special training periods: sleep -- a crucial step in all mammals, might even extend back to all vertebrates.

@erkinalp Жыл бұрын

There are total insomniacs who cannot sleep for years, but they do not exhibit significant learning-related disabilities. Hence sleep should not be considered the only factor for unlearning the falsehoods.

@Gabriel-oi6zb Жыл бұрын

@@erkinalp you might want to wiki that, total insomnia (also called fatal insomnia because you die from it) causes hallucinations.

@schok51 Жыл бұрын

@@erkinalp Sources? I thought sleep deprivation and disorders were pretty universally harmful to cognitive abilities. You cannot simply not sleep and be healthy and functional.

@howtobe7460 Жыл бұрын

This entire comment section looks AI generated 😂😂

@judgeomega 2 жыл бұрын

it seems a contradiction to say you want a model with no domain knowledge yet still having a reward function. doesnt knowledge of a reward imply knowledge of the domain of that reward? the amount of knowledge in the universe is nigh infinite, and we need that reward to anchor our focus on just that which has utility with respect to our goals(rewards).

@schok51 Жыл бұрын

I guess that's just semantics and that the point is that the reward function should encode all that is relevant about the domain?

@LionKimbro Жыл бұрын

I was wondering the very same thing. What's your reward function? With ChatGPT, the score comes from "did I predict the next word accurately?" I have no idea what this system is going to use. One possibility is -- is it going to be an auto-decoder? Don't know.

@ArtOfTheProblem 11 ай бұрын

here I believe he means the "value function" defines the reward, specificially is it getting better or worse. It's not inputting an external reward. reward is part of perception and is learned by the value function (if you understand TD learning)