- Implemented the option of time based learning (basides simulation based learning)
- Implemented discounted rewards for the backtracking step
- Added a parameter for the maximum expansion depth of the meta tree
- Did some optimization for IRTI / RMTL and experimented a bit
- Implemented the Double Integrator environment mentioned in several papers
- Started implementation HOO
- Seems to work for one-step one-dimensional problems (i.e. one-step Donut World)
- Fails in Six Hump Camel Back
- We looked at the results from previous post which looked good
- I should look into the "artifacts" still present in the samples "901-1000" (you still see some lines of samples near the (local and global) optima)
- I still have to add error bars
- Kurt gave me some tips for the presentation and report
- As it was not really clear to me from the literature I asked about the choice of splitting in HOO
- Which dimension? Random
- At which point in the dimension's range to split? Random
- For HOLOP however, dimensions corresponding to early stages in the sequence should be chosen more often as they tend to have a bigger contribution the to return (see HOLOP paper)
- The last term in the upperbound formula (sometimes referred as v1*p^h and sometimes diam(Ph,i)) was also not clear to me. I'm still struggling a bit with this, but on the other hand this should not affect the working of HOO to much.
Planning
- Finish presentation for thesis meeting March 21
- Finalise / Debug HOO
- Thesis report wrinting / restructuring
- Joint meeting: Wednesday, March 28, 2012, 10:00-12:00
No comments:
Post a Comment