I've ran some experiments regarding the four multi-step algorithms in the CartPole environment:
- I used pretty "easy" settings for the environment (no transition/reward/observation noise, a small but sufficient action space (force applied to the cart), etc.).
- On the other hand, I limited the agent to (only) 100 ms per step.
- The rewards are equal to the number of steps the agent is able to balance the pole.
- Per algorithm, 100 runs were performed.
- Note that I did not improve/profile the agents yet and neither did I tune the parameters in detail. Furthermore, I could only do 100 runs due to time constraints of today's meeting. Therefore, the results have to be considered preliminary.
algorithm | reward
|
<= 100
|
> 100
|
> 250
|
> 500
|
>= 1000
|
RMTL
|
0
|
0
|
0
|
1
|
99
|
RSTL
|
2
|
2
|
3
|
3
|
90
|
HOSTL
|
2
|
2
|
3
|
8
|
85
|
HOMTL
|
0
|
0
|
0
|
0
|
100
|
- The next table shows the number of simulations each algorithm can perform in 1 second (roughly; on average) during the first 10 steps in the CartPole environment.
algorithm | avg sims/s
|
Memorization
|
No memorization
|
RMTL
|
37000
|
39000
|
RSTL
|
18000
|
32000
|
HOSTL
|
7300
|
7700
|
HOMTL
|
5000
|
5000
|
Thesis Overview
No comments:
Post a Comment