You missed the essence of PPO entirely. It's done because simulators are slow. And to make training feasible you wanna re-use the training data sampled with the old policy for multiple updates. But the ratio would just go crazy and cause unstable training. So a ratio bound gets introduced. Compute the policy gradients and have a look at how the clipping and min work. It's so genius that the gradients related to "bad" training examples become zero due to the constant rule.