Wednesday, March 28, 2012

Meeting 28-03-2012


Action Points
  • We all taked about our progress, results so far and current work
  • I implemented IRTI, TLS, HOO and HOLOP although the latter two are under development
  • By observing the agent's behavior in RL Viz I notices a couple of things which I should look into
    • IRTI gets stuck in a non-optimal regions very rarely (Six Hump Camel Back; approximately 1 out of 100 times)
    • The TLS agent does sometimes a bad move in the Double Integrator
    • The sinus function seems a difficult problem for IRTI and sampling is mostly done at the local maxima at the left
    • HOO performs better than IRTI at the Sinus function and very bad in the Six Hump Camel Back environment
    • HOO is much slower than IRTI
    • HOO does not explore correctly
    • HOLOP performs bad, most likely due to the incorrect workings of HOO
  • I should look into these problems. The planning below show some solutions for above problems
Planning
  • Try to split sooner for the Sinus environment and observe results
  • Make nodes beforehand
  • Remove storing the action ranges within the nodes (to avoid array copying each split)
  • Scale the exploration term in the calculation of U in HOO
  • Update U and B values before the selection step
  • Change the discounting of reward (the power should match the depth)
  • Investigate saving of images in RL Viz
Next meeting
  • Individual meeting: +- April 11
  • Joint meeting: +- April 25

No comments:

Post a Comment