- Lukas pointed out that my discounting was incorrect in TLS. The power of the gamma should always match the depth, even when updating nodes lower in the tree. I fixed this.
- Instantiating nodes beforehand for HOO did not make a difference in the performance.
- Removing the storage of the action ranges within the nodes (to avoid array copying each split) makes only a small difference in simulations per second. For compatibility and readability I decided to stick to the initial implementation of having the action ranges within the nodes.
- Updating U and B values before the selection step increased the number of simulations. This means a full tree traversal in done to update the U and B values of all nodes, updating the children before the parent (since the calculation of B takes the B-values of the children). To avoid a stack overflow when using a recursive post-order traversal, I decided to (also) store the (reference to) nodes in a list in which the nodes added in order of creation. By simply using a descendingIterator makes sure the children are always updated before its parent.
- I tried scaling the exploration term in the calculation of U in HOO, but did not (yet) succeed to let the HOO algorithm perform well in environments with rewards not in the range of 0 to 1. The sinus environment however is in the range of about 0 to 0.6 and HOO performs well as long as the splitting occurs in or near the middle (see next point).
- I implemented a choice in splitting behaviour for HOO (the decision of where to split given an action range of one of the action dimensions)
- Normally distributed around the middle of the range (with an option to change the shape of the "bell")
- Uniformly distributed
- Exactly in the middle
- Splitting sooner for the regression tree agent in the sinus environment did not change the reward; it keeps sampling in the left most peek most of the time.
- I found a problem regarding the adaptive C value. This value can be 0, "removing" the exploration term whenever a node has count 1 in the selection step (because then min equals max and therefore the size of the range equals 0). For regression trees this never occurs since a node has gathered more than 1 sample before splitting but for HOO this can be a problem.
Planning
- Try to split sooner for the Sinus environment and observe results
- Scaling the exploration term in the calculation of U in HOO
- Investigate saving of images in RL Viz
- Generate some pictures / results
- Individual meeting: Thursday, April 12, 2012, 11:00 - 12:00
- Joint meeting: Wednesday, May 2, 13:00 - 15:00