Wednesday, May 2, 2012

Feedback 02-05-2012

Multi-step Results

I've ran some experiments regarding the four multi-step algorithms in the CartPole environment:

- I used pretty "easy" settings for the environment (no transition/reward/observation noise, a small but sufficient action space (force applied to the cart), etc.).
- On the other hand, I limited the agent to (only) 100 ms per step.
- The rewards are equal to the number of steps the agent is able to balance the pole.
- Per algorithm, 100 runs were performed.
- Note that I did not improve/profile the agents yet and neither did I tune the parameters in detail. Furthermore, I could only do 100 runs due to time constraints of today's meeting. Therefore, the results have to be considered preliminary.

algorithm  |  reward
<= 100
> 100
> 250
> 500
>= 1000
RMTL
0
0
0
1
99
RSTL
2
2
3
3
90
HOSTL
2
2
3
8
85
HOMTL
0
0
0
0
100

- The next table shows the number of simulations each algorithm can perform in 1 second (roughly; on average) during the first 10 steps in the CartPole environment.

algorithm   |   avg sims/s
Memorization
No memorization
RMTL
37000
39000
RSTL
18000
32000
HOSTL
7300
7700
HOMTL
5000
5000


Thesis Overview
























No comments:

Post a Comment