Colin Schepers' Blog: Meeting 16-03-2012

Saturday, March 17, 2012

Meeting 16-03-2012

Work

Implemented the option of time based learning (basides simulation based learning)
Implemented discounted rewards for the backtracking step
Added a parameter for the maximum expansion depth of the meta tree
Did some optimization for IRTI / RMTL and experimented a bit
Implemented the Double Integrator environment mentioned in several papers
Started implementation HOO

Seems to work for one-step one-dimensional problems (i.e. one-step Donut World)
Fails in Six Hump Camel Back

Action Points

We looked at the results from previous post which looked good

I should look into the "artifacts" still present in the samples "901-1000" (you still see some lines of samples near the (local and global) optima)
I still have to add error bars

Kurt gave me some tips for the presentation and report
As it was not really clear to me from the literature I asked about the choice of splitting in HOO

Which dimension? Random
At which point in the dimension's range to split? Random

For HOLOP however, dimensions corresponding to early stages in the sequence should be chosen more often as they tend to have a bigger contribution the to return (see HOLOP paper)
The last term in the upperbound formula (sometimes referred as v1*p^h and sometimes diam(Ph,i)) was also not clear to me. I'm still struggling a bit with this, but on the other hand this should not affect the working of HOO to much.

Planning

Finish presentation for thesis meeting March 21
Finalise / Debug HOO
Thesis report wrinting / restructuring

Next meeting

Joint meeting: Wednesday, March 28, 2012, 10:00-12:00

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)