Рет қаралды 719
A simple, model-free Q-Learning, reinforcement learning agent navigating a car on a track. Code for example (4) is available here: github.com/cpe...
The purpose of this exercise was to test whether/how the agent would follow the track given only coarse sensory information.
The agent has been trained on tracks different from the one shown (track shown: length: 800 m, width: 4 m, curve radii: 5.2 to 24 m.). Training tracks included: (a) oval tracks with aspect ratio 10:1, (b) elliptical circuits, (c) pentagon to nonagon tracks. Tracks of type (b) and (c) had curve radii of 0 m and track widths of 3.8 to 5.0 m. Tracks of type (a) were 7.6 to 10 m wide with 400 m long straight segments.
Q values are stored in a table for the set of discrete states and actions.
Possible actions are any combination of speed change and wheel rotation: speed-up one step (+), speed-down one step (-), maintain speed (.), turn wheel more left (<), turn wheel more right (>), leave wheel as-is (|).
The state of the learning agent is the combination of: speed (-0.1, 0, 0.1, 0.3, 1.0, or 3.0 m per time step), nominal wheel direction (-35, -20, -10, -5, -2.5, 0, 2.5, 5, 10, 20, 35 degrees), clearance rear, front left, rear left, front right rear right (<0.1 m, ≥0.1 m) and left front and right front (<0.1 m, <0.3 m, <1 m, <2 m, <3 m, <10 m, ≥10 m).
Rewards given are inversely proportional to speed (note: all rewards are negative), or -3000 in case of collision with the track boundaries.
The discrete state is illustrated by a sketch (e.g., yellow zones are unobstructed areas), and the table shows the Q-values associated with the possible actions in the current state. Actions are chosen based on a greedy policy.