Training a Neural Network to play Ratchet & Clank

Training a Neural Network to play Ratchet & Clank - FAQ in vid description

Рет қаралды 1,292

Күн бұрын

You can also watch on Twitch: / bordplate
This is a neural network learning to play Ratchet
& Clank through Reinforcement Learning.
The goal is to finish the Fitness Course on Kerwan.
It's built on a technique known as PPO (Proximal
Policy Optimization).
The agent gets information about Ratchet's position,
state, and more. Additionally, there are 64 raycasts
shooting from the camera to provide depth and terrain
information.
The agent can freely move both joysticks and
use R1, Square and Cross.
You can join the Boltcrate Discord server for discussions about this: / discord
See training stats here: wandb.ai/bordp...
Source code: github.com/bor...
Frequently Asked Questions:
* Has it finished Veldin and Novalis before this?
No, I've made other similar projects to do the hoverboard race on Rilgar and the Vidcomics in Ratchet 3. This one started here with the sole goal of getting to the end of the fitness course.
* Are you going to have it play other levels later?
After this one has learned the fitness course, I'm planning on moving over to Ratchet 3 instead. I'm not sure exactly what I'll have it learn in the game, but I'm considering something with multiplayer or arena challenges.
* Why don't you remove square as an action? Wouldn't that make it easier for it to learn?
Yes, but there are many good reasons to keep it around. Most of the time when you see it spam square it's because of a technical problem with the learning process. It should easily be able to learn not to spam square, so it serves as an indicator of how well the learning process works. There are also some situations where it could be beneficial for it to use square, and so I want to see if it is able to learn that.
* Why is it teleporting around?
There's a 30 second countdown that resets when it gets to the next checkpoint which is represented as the purple line at the top of the screen. When its time is up, it has to reset from the start. However, there's a 25% chance it spawns at a random location on Kerwan. This is to have variety in the training data, it's a good way to make it generalize more than just starting the same every time. The background agents spawn randomly across the map 70% of the time.
* Are there more instances of the game running in the background?
Yes. There are 2 instances running in the background, which run just below 4x speed. These run with significantly more randomness added to their actions to encourage exploration. The agent we see on the stream does not contribute to learning, due to technical reasons I haven't gotten around to fix.
* What are the giant transparent spinning crates?
They represent the next checkpoint the agent has to reach.
* Why is it lagging/slowing down occasionally?
The lag you’re observing isn’t caused by the emulator itself. Instead, it happens because the training process requires a significant amount of GPU resources. When the neural network is being trained, it heavily utilizes the GPU, which can cause the emulator to slow down. This is a common issue when running both the training and the emulation on the same system. If the GPU is fully occupied with training tasks, it doesn’t have enough resources left to run the emulator smoothly. This is why you might see the gameplay stuttering or lagging during these periods.
===========================
Rewards Overview
===========================
I may change these in the code and forget to update description.
Last updated: 27.05.2024
1. Crash Penalty:
Description: Applied when the game crashes.
Reward: -1.0 points.
2. Death Penalty:
Description: Applied when Ratchet dies in the game.
Reward: -1.5 points.
3. Moving Towards Checkpoint:
Description: Given when the agent moves closer to the next checkpoint.
Reward: The distance travelled towards next checkpoint in 2 game frames multiplied by 0.5. Meant to encourage higher speed and making it properly angle towards the checkpoint when it does long jumps or similar. Basically, it should bee line to the checkpoint whenever possible.
4. Reached Checkpoint Reward:
Description: Given when it reaches a checkpoint.
Reward: 5.0 points.
5. Moving Away from Checkpoint Penalty:
Description: Applied as long as it's 10 meters away from the closest it has been to the checkpoint. So if it has been within 30 meters of the checkpoint and it is now more than 40 meters away, it will be punished until it is less than 40 meters away.
Reward: -0.05 points.
6. Timeout Penalty:
Description: Applied if it takes too long to reach a checkpoint.
Reward: -1.0 points.
===========================
I don't have a problem explaining why I've chosen to do something the way I do it. If you're curious, please feel free to ask questions. However, please avoid phrasing your curiosity as unsolicited advice. Thanks!