Colin Schepers' Blog: Meeting 24-02-2012

Action Points

I explained that I just fixed a bug regarding the adaptive UCT constant. During selection each child should take the same UCT constant (dependant on the range of the parent node), whereas at first I used the ranges of the children themselves meaning the children would use different UCT constants. In some cases a "bad" child would be preferred over a "good" child (with high average / low number of trails) when this bad child has a low minimum (and so a large range size).
I showed the behaviour and results in RL Glue and .cvs data; this looked good
Two problems are still open and are to be investigated (Kurt; thanks in advance)

From the Scala code it can be concluded that a T-test is used for the Six Hump Camel Back experiment, while the paper itself states an F-test is used?
The F-values used in the Scala code are of the entries 1 and n degrees of freedom of the F-Distribution table, while I though it would be more logical to use the entry n and n (because the degrees of freedom is equal for both sides)?

HOLOP and TLS and the combinations have been explained to me. On the one hand you have a "tree of trees" taking either a regression tree (as in TLS) or HOO and on the other hand you have a "tree of sequences" which can also either be in the form of a regression tree or HOO (as in HOLOP). A sequence can in this case be seen as the cartesian product / concatenation of the actions taken during an episode.
Besides these four algorithms, Michael proposed UCT with pre-discretization. I said that I already implemented a simple agent that pre-discretizes the action space (in a specified number of "parts") and uses a basic UCT algorithm to learn. I did this to test the MCTS / UCT methods / components which are also used in the regression trees.
I announced that I have a job and will be working two days a week from now on. This means my productivity will slightly decrease.

Planning

Implement (in Matlab) reading and visualizing the output from experiments
Write everything relevant from this blog into the Latex draft already
Think about the names of all 4 combinations of algorithms
Implement TLS taking the regression tree component in a multi-step environment. Think about the representation of the sequence of the actions fed to the TLS component (and later to the HOLOP component)

Next meeting

Colin Schepers' Blog