Action Points
- We all taked about our progress, results so far and current work
- I implemented IRTI, TLS, HOO and HOLOP although the latter two are under development
- By observing the agent's behavior in RL Viz I notices a couple of things which I should look into
- IRTI gets stuck in a non-optimal regions very rarely (Six Hump Camel Back; approximately 1 out of 100 times)
- The TLS agent does sometimes a bad move in the Double Integrator
- The sinus function seems a difficult problem for IRTI and sampling is mostly done at the local maxima at the left
- HOO performs better than IRTI at the Sinus function and very bad in the Six Hump Camel Back environment
- HOO is much slower than IRTI
- HOO does not explore correctly
- HOLOP performs bad, most likely due to the incorrect workings of HOO
- I should look into these problems. The planning below show some solutions for above problems
Planning
- Try to split sooner for the Sinus environment and observe results
- Make nodes beforehand
- Remove storing the action ranges within the nodes (to avoid array copying each split)
- Scale the exploration term in the calculation of U in HOO
- Update U and B values before the selection step
- Change the discounting of reward (the power should match the depth)
- Investigate saving of images in RL Viz
- Individual meeting: +- April 11
- Joint meeting: +- April 25