Reinforcement Learning Made Simple

Reinforcement Learning Made Simple - Policy

Рет қаралды 2,748

Edan Meyer

Күн бұрын

Пікірлер: 6

@Maxwellpaulwall Жыл бұрын

phenomenal content. Great work, very easy to understand

@zuloo37 3 жыл бұрын

The Gaussian shape of the output makes sense if the output can be any real number, but if it's on a finite continuous range like [0, 1], wouldn't that make the probability densities at the endpoints unusually high and discontinuous (assuming you clip the output)? Would it make more sense to use something like a beta distribution for that kind of space?

@EdanMeyer 3 жыл бұрын

This is a really great question! You absolutely could use something like a Beta distribution. As far as I'm aware there isn't any mathematic reason in particular that OpenAI likes to use Gaussian distributions (other than their natural convenient properties). You could even drop the distribution entirely and just predict a specific value, but predicting the distribution instead can help with exploration.

@zuloo37 3 жыл бұрын

Is there any problem with lack of differentiability at the endpoints if you use a Gaussian on a clamped output range? Or does this system not use backprop?

@EdanMeyer 3 жыл бұрын

@@zuloo37 Just thinking off the top of my head here, but one thing you can do if you are using a learning algorithm that requires differentiability is to perform the loss calculation on the unclipped output. But you can still clip it before you apply it to the environment, and that should work.