Saturday, March 17, 2012

Meeting 16-03-2012

Work
  • Implemented the option of time based learning (basides simulation based learning)
  • Implemented discounted rewards for the backtracking step
  • Added a parameter for the maximum expansion depth of the meta tree
  • Did some optimization for IRTI / RMTL and experimented a bit
  • Implemented the Double Integrator environment mentioned in several papers
  • Started implementation HOO
    • Seems to work for one-step one-dimensional problems (i.e. one-step Donut World)
    • Fails in Six Hump Camel Back
Action Points
  • We looked at the results from previous post which looked good
    • I should look into the "artifacts" still present in the samples "901-1000" (you still see some lines of samples near the (local and global) optima)
    • I still have to add error bars
  • Kurt gave me some tips for the presentation and report
  • As it was not really clear to me from the literature I asked about the choice of splitting in HOO
    • Which dimension? Random
    • At which point in the dimension's range to split? Random
  • For HOLOP however, dimensions corresponding to early stages in the sequence should be chosen more often as they tend to have a bigger contribution the to return (see HOLOP paper)
  • The last term in the upperbound formula (sometimes referred as v1*p^h and sometimes diam(Ph,i)) was also not clear to me. I'm still struggling a bit with this, but on the other hand this should not affect the working of HOO to much. 
Planning
  • Finish presentation for thesis meeting March 21
  • Finalise / Debug HOO
  • Thesis report wrinting / restructuring
Next meeting
  • Joint meeting: Wednesday, March 28, 2012, 10:00-12:00

No comments:

Post a Comment