Colin Schepers' Blog: Meeting 28-03-2012

Action Points

We all taked about our progress, results so far and current work
I implemented IRTI, TLS, HOO and HOLOP although the latter two are under development
By observing the agent's behavior in RL Viz I notices a couple of things which I should look into

IRTI gets stuck in a non-optimal regions very rarely (Six Hump Camel Back; approximately 1 out of 100 times)
The TLS agent does sometimes a bad move in the Double Integrator
The sinus function seems a difficult problem for IRTI and sampling is mostly done at the local maxima at the left
HOO performs better than IRTI at the Sinus function and very bad in the Six Hump Camel Back environment
HOO is much slower than IRTI
HOO does not explore correctly
HOLOP performs bad, most likely due to the incorrect workings of HOO

I should look into these problems. The planning below show some solutions for above problems

Planning

Try to split sooner for the Sinus environment and observe results
Make nodes beforehand
Remove storing the action ranges within the nodes (to avoid array copying each split)
Scale the exploration term in the calculation of U in HOO
Update U and B values before the selection step
Change the discounting of reward (the power should match the depth)
Investigate saving of images in RL Viz

Next meeting

Colin Schepers' Blog