loading…
A4.3.6 HL · reinforcement learning
No examples, no answers, just a reward at the goal and a penalty in the pit. The agent learns by trying, banking what works, and slowly the grid lights up with the value of each square. Set how much it explores and watch a path emerge.
Circle = start, green = goal, orange = the pit to avoid. Cells brighten with their learned value and arrows show the chosen move. Click an empty cell (or focus it and press Enter) to drop a wall; training restarts so the agent learns the new maze.