Colin Schepers' Blog: Meeting 12-04-2012

Saturday, April 14, 2012

Meeting 12-04-2012

Work

Fixing, debugging and improving IRTI and HOO.
Furthermore, a lot of parameter tuning for above two algorithms.

Action Points

Michael and I had a discussion about the exploration factor

C should be related to the reward range

constant C = globalRangeSize * K
adaptive C = parentRangeSize * K
Lukas proposed: adaptive C = (childRangeSize / globalRangeSize) * K

For HOO, you should also multiply the last term (diam) with C or divide the first term (mean reward) with C.

R + C * exploration + C * diam
R / C + exploration + diam

Planning

Change the HOO formula (see action points)
Generate reward distributions (learning curve, greedy curve, error) of several algorithms

IRTI
HOO
UCT (pre-discretization)
Vanilla MC (random sampling, greedy returns best sample seen)
Random

I should ask Lukas about the working of his IRTI algorithm to find out why mine does not converge samling at the global maximum.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)