phenomenal content. Great work, very easy to understand
@zuloo373 жыл бұрын
The Gaussian shape of the output makes sense if the output can be any real number, but if it's on a finite continuous range like [0, 1], wouldn't that make the probability densities at the endpoints unusually high and discontinuous (assuming you clip the output)? Would it make more sense to use something like a beta distribution for that kind of space?
@EdanMeyer3 жыл бұрын
This is a really great question! You absolutely could use something like a Beta distribution. As far as I'm aware there isn't any mathematic reason in particular that OpenAI likes to use Gaussian distributions (other than their natural convenient properties). You could even drop the distribution entirely and just predict a specific value, but predicting the distribution instead can help with exploration.
@zuloo373 жыл бұрын
Is there any problem with lack of differentiability at the endpoints if you use a Gaussian on a clamped output range? Or does this system not use backprop?
@EdanMeyer3 жыл бұрын
@@zuloo37 Just thinking off the top of my head here, but one thing you can do if you are using a learning algorithm that requires differentiability is to perform the loss calculation on the unclipped output. But you can still clip it before you apply it to the environment, and that should work.