Saturday, April 14, 2012

Meeting 12-04-2012

Work
  1. Fixing, debugging and improving IRTI and HOO. 
  2. Furthermore, a lot of parameter tuning for above two algorithms.
Action Points
  1. Michael and I had a discussion about the exploration factor
    1. C should be related to the reward range
      1. constant C =  globalRangeSize * K
      2. adaptive C =  parentRangeSize * K
      3. Lukas proposed: adaptive C = (childRangeSize / globalRangeSize) * K
    2. For HOO, you should also multiply the last term (diam) with C or divide the first term (mean reward) with C.
      1. R + C * exploration + C * diam 
      2. R / C + exploration + diam
Planning
  • Change the HOO formula (see action points)
  • Generate reward distributions (learning curve, greedy curve, error) of several algorithms
    • IRTI
    • HOO
    • UCT (pre-discretization)
    • Vanilla MC (random sampling, greedy returns best sample seen)
    • Random
  • I should ask Lukas about the working of his IRTI algorithm to find out why mine does not converge samling at the global maximum.

No comments:

Post a Comment