Colin Schepers' Blog: Feedback 27-04-2012

Friday, April 27, 2012

Feedback 27-04-2012

Experiment Results

I also ran experiments in the Six Hump Camel Back environment. Again, all experiments are done over 1000 samples and are averaged over 1000 runs. For every experiment the 95% confidence intervals are displayed by a colored region surrounding the lines.

Incremental Regression Tree Induction (IRTI)

MCTS_C = (parentRangeSize / globalRewardVolume) * Config.MCTS_K

MCTS_K=1.0

IRTI_SPLIT_NR_TESTS=100

IRTI_SPLIT_MIN_NR_SAMPLES=75

IRTI_SIGNIFICANCE_LEVEL=0.001

IRTI_MEMORIZATION = true

Hierarchical Optimistic Optimization (HOO)

MCTS_C = (parentActionSpaceVolume / globalActionSpaceVolume) * globalRewardVolume * MCTS_K)
MCTS_K=0.5
HOO_V_1 = (sqrt(nrActionDimensions) / 2) ^ HOO_ALPHA
HOO_RHO = 2 ^ (- HOO_ALPHA / nrActionDimensions)
HOO_ALPHA=0.99
HOO_MEMORIZATION = true

IRTI + HOO + MC + Random + UCT (pre-discretization with 2 splits per depth)

Planning

Run experiments regarding the multi-step agents and possibly debug / optimize code.

1 comment:

MichaelKaisers27 April, 2012 20:19
Dear Colin,

these are very nice results, and should be a good basis for your multi-step experiments. Here are some comments:

1. I am somewhat surprised by the irregular behavior of IRTI in the regret plot, but then again it may directly follow from the splitting criterion that may find the small differences too insignificant to differentiate. Can you add some parameter details (to all algorithms)? I guess you are using perfect recall for re-using samples?

2. Also, do you use UCT now to split each action dimension into a fixed number of bins, or do you create a tree with a constant branching factor?

3. Do you have any explanation why HOO has these visible boundaries and IRTI doesn't?

4. Can you add error bars for confidence intervals to the greedy reward and regret plots?

I'm looking forward to the next post :)
ReplyDelete
Replies

Add comment