Colin Schepers' Blog: June 2012

Results

Donut World

Step limit: 10 steps per episode
Reward: +1 for being on the donut, 0 otherwise
Gaussian noise added to the rewards: mean = 0, sigma = 0.5
Optimal reward equals 7 because of turning the first 3 steps
Results averaged over 10,000 episodes
Algorithm MC randomly samples 10-step trajectories and plays the best found
LTT is the "Level-based Transposition Tree" with first a regression tree discretizing the state space and each leaf of this tree holds a regression tree discretizing the action space (for that region of the state space).
MTT is the "Mixed Transposition Tree" which is only one regression tree but also splits on the state space.
Random performs worst, followed by MC.
Performance wise, STL > RTL and then HOO > IRTI
LTT performs about average in comparison with the four algorithms mentioned above. MTT performs bad in comparison with the other algorithms (probably due to the problem stated in the previous post), but is still better than MC and Random.
Obviously MC and Random are the fastest algorithms
HOO is the slowest and the plot shows its running time of O(n^2)
TT is a bit slower than RMTL and RSTL, probably due to that fact that they re-use the same tree build in previous real-world steps.
Initially MTL is a bit faster than STL, but MTL's computational complexity is slightly higher (due to the perfect recall).

Colin Schepers' Blog