Friday, April 27, 2012

Feedback 27-04-2012


Experiment Results

I also ran experiments in the Six Hump Camel Back environment. Again, all experiments are done over 1000 samples and are averaged over 1000 runs. For every experiment the 95% confidence intervals are displayed by a colored region surrounding the lines.

Incremental Regression Tree Induction (IRTI)
  • MCTS_C = (parentRangeSize / globalRewardVolume) * Config.MCTS_K
  • MCTS_K=1.0
  • IRTI_SPLIT_NR_TESTS=100
  • IRTI_SPLIT_MIN_NR_SAMPLES=75
  • IRTI_SIGNIFICANCE_LEVEL=0.001
  • IRTI_MEMORIZATION = true







Hierarchical Optimistic Optimization (HOO)
  • MCTS_C = (parentActionSpaceVolume / globalActionSpaceVolume) * globalRewardVolume * MCTS_K)
  • MCTS_K=0.5
  • HOO_V_1 = (sqrt(nrActionDimensions) / 2) ^ HOO_ALPHA
  • HOO_RHO = 2 ^ (- HOO_ALPHA /  nrActionDimensions)
  • HOO_ALPHA=0.99
  • HOO_MEMORIZATION = true

IRTI + HOO +  MC + Random + UCT (pre-discretization with 2 splits per depth)


Planning
  1. Run experiments regarding the multi-step agents and possibly debug / optimize code.

1 comment:

  1. Dear Colin,

    these are very nice results, and should be a good basis for your multi-step experiments. Here are some comments:

    1. I am somewhat surprised by the irregular behavior of IRTI in the regret plot, but then again it may directly follow from the splitting criterion that may find the small differences too insignificant to differentiate. Can you add some parameter details (to all algorithms)? I guess you are using perfect recall for re-using samples?

    2. Also, do you use UCT now to split each action dimension into a fixed number of bins, or do you create a tree with a constant branching factor?

    3. Do you have any explanation why HOO has these visible boundaries and IRTI doesn't?

    4. Can you add error bars for confidence intervals to the greedy reward and regret plots?

    I'm looking forward to the next post :)

    ReplyDelete