Colin Schepers' Blog: Feedback 02-05-2012

Multi-step Results

I've ran some experiments regarding the four multi-step algorithms in the CartPole environment:

- I used pretty "easy" settings for the environment (no transition/reward/observation noise, a small but sufficient action space (force applied to the cart), etc.).
- On the other hand, I limited the agent to (only) 100 ms per step.
- The rewards are equal to the number of steps the agent is able to balance the pole.
- Per algorithm, 100 runs were performed.
- Note that I did not improve/profile the agents yet and neither did I tune the parameters in detail. Furthermore, I could only do 100 runs due to time constraints of today's meeting. Therefore, the results have to be considered preliminary.

algorithm \| reward	<= 100	> 100	> 250	> 500	>= 1000
RMTL	0	0	0	1	99
RSTL	2	2	3	3	90
HOSTL	2	2	3	8	85
HOMTL	0	0	0	0	100

- The next table shows the number of simulations each algorithm can perform in 1 second (roughly; on average) during the first 10 steps in the CartPole environment.

algorithm \| avg sims/s	Memorization	No memorization
RMTL	37000	39000
RSTL	18000	32000
HOSTL	7300	7700
HOMTL	5000	5000

Thesis Overview

Colin Schepers' Blog

Wednesday, May 2, 2012

Feedback 02-05-2012

No comments:

Post a Comment