Colin Schepers' Blog: 18-05

Results

Sinus environment: time-based

Six Hump Camel Back environment: time-based

CartPole environment: time-based

I've run the CartPole experiment again with more challenging properties:

Goal: balance the pole for 1000 steps
Payoff:

For each step that the pole is balanced: +1
If the pole's angle is > 12 degrees from upright position: -1 (and terminal state)
Maximum expected payoff of 1000 due to the goal's statement

Added Gaussian noise with standard deviation of 0.3 to both rewards and actions
See [1] for more detailed description of the environment properties
Agent's evaluation time per step: 50 ms

	Succes rate	Average payoff
IRTI + TLS	94%	971.747
IRTI + HOLOP	85%	922.047
HOO + TLS	0%	77.254
HOO + HOLOP	3%	273.525
Transposition Tree	65%	808.480
MC	9%	389.590

[1] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.

Colin Schepers' Blog

Friday, May 18, 2012

Feedback 18-05-2012