Monday, March 5, 2012

Feedback 05-03-2012

Work
  • Set up structure (chapters / sections) of the report 
  • Wrote few blocks of text for report
  • Though about the names for the 4 combinations (Regression Tree / Hoo + Meta Tree / Sequence Tree)
    • Regression-based Meta Tree Learning (RMTL)
      • similar to Tree Learning Search (TLS)
    • Hierarchical Optimistic Sequence-based Tree Learning (HOSTL)
      • similar to Hierarchical Open-Loop Optimistic Planning (HOLOP)
    • Hierarchical Optimistic Meta Tree Learning (HOMTL)
    • Regression and Sequence-based Tree Learning (RSTL)
  • While thinking about the meta tree interface for next meeting I already had a go on the implementation resulting in a working RMTL / TLS agent (so it seems)
  • Below are some results for a multi-step problem using RMTL / TLS
    • Environment: Donut World
    • average of 100 episodes
    • 3 step limit
    • 10,000 simulations per step
    • maximum reward per step = 1 (when being exactly on the middle of the donut region)
    • gradual reward degrease to 0 towards the (inner or outer) edge of the donut 
    • Minimum reward = 0 (off the donut)
Step
Average reward
Minimum reward
Maximum reward
1
0.977023
0.888396
0.999574
2
0.983432
0.768632
0.999977
3
0.993064
0.62132
1
 
Average cumulative reward
Minimum cumulative reward
Maximum cumulative reward
2.95352
2.555052
2.998965

Planning
  • Preparation for meeting Wednesday March 7th
  • Report writing
  • Debug / investigate TLS / RMTL

No comments:

Post a Comment