- Lukas pointed out that my discounting was incorrect in TLS. The power of the gamma should always match the depth, even when updating nodes lower in the tree. I fixed this.
- Instantiating nodes beforehand for HOO did not make a difference in the performance.
- Removing the storage of the action ranges within the nodes (to avoid array copying each split) makes only a small difference in simulations per second. For compatibility and readability I decided to stick to the initial implementation of having the action ranges within the nodes.
- Updating U and B values before the selection step increased the number of simulations. This means a full tree traversal in done to update the U and B values of all nodes, updating the children before the parent (since the calculation of B takes the B-values of the children). To avoid a stack overflow when using a recursive post-order traversal, I decided to (also) store the (reference to) nodes in a list in which the nodes added in order of creation. By simply using a descendingIterator makes sure the children are always updated before its parent.
- I tried scaling the exploration term in the calculation of U in HOO, but did not (yet) succeed to let the HOO algorithm perform well in environments with rewards not in the range of 0 to 1. The sinus environment however is in the range of about 0 to 0.6 and HOO performs well as long as the splitting occurs in or near the middle (see next point).
- I implemented a choice in splitting behaviour for HOO (the decision of where to split given an action range of one of the action dimensions)
- Normally distributed around the middle of the range (with an option to change the shape of the "bell")
- Uniformly distributed
- Exactly in the middle
- Splitting sooner for the regression tree agent in the sinus environment did not change the reward; it keeps sampling in the left most peek most of the time.
- I found a problem regarding the adaptive C value. This value can be 0, "removing" the exploration term whenever a node has count 1 in the selection step (because then min equals max and therefore the size of the range equals 0). For regression trees this never occurs since a node has gathered more than 1 sample before splitting but for HOO this can be a problem.
Planning
- Try to split sooner for the Sinus environment and observe results
- Scaling the exploration term in the calculation of U in HOO
- Investigate saving of images in RL Viz
- Generate some pictures / results
- Individual meeting: Thursday, April 12, 2012, 11:00 - 12:00
- Joint meeting: Wednesday, May 2, 13:00 - 15:00
Hey,
ReplyDelete- can you use enumerations rather than bullets so I can refer to your points with numbers?
- your first point is a matter of definition and coherence. If you are reusing trees you may want to pretend at any depth that it is 'now', and only discount when backing up.
- the adaptive C (which is then NOT a constant) in combination with HOO is a new idea, so make sure to compare it to some constant C value.
- Are there any parameter values you can use for the reproduction of the sinus/RegressionTree results of the TLS article?
Good luck, Michael
Hi Michael,
ReplyDelete1. It's now enumerated :)
2. Ok, I'm not reusing trees.
3. Ok, I'll compare to a constant C and observe results.
4. I will have another "dive" into the Scala code and keep on trying...
Thanks for the feedback!
Regards,
Colin