Showing posts with label 18-05. Show all posts
Showing posts with label 18-05. Show all posts

Friday, May 18, 2012

Feedback 18-05-2012

Results

Sinus environment: time-based


Six Hump Camel Back environment: time-based


CartPole environment: time-based

I've run the CartPole experiment again with more challenging properties:
  1. Goal: balance the pole for 1000 steps
  2. Payoff:
    1. For each step that the pole is balanced: +1
    2. If the pole's angle is > 12 degrees from upright position: -1 (and terminal state)
    3. Maximum expected payoff of 1000 due to the goal's statement
  3. Added Gaussian noise with standard deviation of 0.3 to both rewards and actions
  4. See [1] for more detailed description of the environment properties
  5. Agent's evaluation time per step: 50 ms


Succes rate
Average payoff
IRTI + TLS
94%
971.747
IRTI + HOLOP
85%
922.047
HOO + TLS
0%
77.254
HOO + HOLOP
3%
273.525
Transposition Tree
65%
808.480
MC
9%
389.590


[1] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.