- Finished the "Donut World" environment. Some small comments to discuss next meeting;
- Agent can turn only for a limited range left or right (default 135.0 degrees); otherwise it could walk back and forth in stead of forward over the donut
- If the agent would be located outside the bounds at the next step, its next location will be where it would touch the bounds. Other option could be: Game Over?
- Several other options are configurable as parameters;
- step limit
- agent's step size
- agent's maximum turn angle (in degrees)
- agent's starting x, y, and angle
- donut's x, y, radius and thickness
- a single value reward or a gradual reward range when being on donut
- amount of noise (observation and/or transition noise)
- Here is again an overview of the environments that can be used to validate the algorithms. See references/weblinks for more information.
- Sinus function optimization (upcoming; see [1])
- 1 dimensional continuous observation and action space
- step limit = 1
- Six hump camel back function optimization (see [1])
- 2 dimensional continuous observation and action space
- step limit = 1
- Donut world
- 3 dimensional continuous observation space (x, y and alpha of the agent)
- 1 dimensional continuous action space (amount of degree to turn left or right)
- Cart pole (discrete actions variant: link)
- 4 dimensional continuous observation space
- 1 dimensional continuous action space
- Helicopter hoovering
- 12 dimensional continuous observation space
- 4 dimensional continuous action space
- Octopus arm
- 82 dimensional continuous observation space
- 32 dimensional continuous action space
- Evaluation approach
- All environments can be measured by the (cumulative) reward at each time step. These series of values can be benchmarked against other results. Results will be averaged for better approximations.
- These rewards can be compared to other agents
- Random agent
- Lukas' agent(s)
- Some environments are tested in literature which can be used for evaluation
- The two function optimization functions are discussed in [1]
- The Cart pole environment with continuous actions is discussed in [2]

[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD

2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12,

Sep 2011.

[2] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.

[2] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.