Action Points
- I explained that I just fixed a bug regarding the adaptive UCT constant. During selection each child should take the same UCT constant (dependant on the range of the parent node), whereas at first I used the ranges of the children themselves meaning the children would use different UCT constants. In some cases a "bad" child would be preferred over a "good" child (with high average / low number of trails) when this bad child has a low minimum (and so a large range size).
- I showed the behaviour and results in RL Glue and .cvs data; this looked good
- Two problems are still open and are to be investigated (Kurt; thanks in advance)
- From the Scala code it can be concluded that a T-test is used for the Six Hump Camel Back experiment, while the paper itself states an F-test is used?
- The F-values used in the Scala code are of the entries 1 and n degrees of freedom of the F-Distribution table, while I though it would be more logical to use the entry n and n (because the degrees of freedom is equal for both sides)?
- HOLOP and TLS and the combinations have been explained to me. On the one hand you have a "tree of trees" taking either a regression tree (as in TLS) or HOO and on the other hand you have a "tree of sequences" which can also either be in the form of a regression tree or HOO (as in HOLOP). A sequence can in this case be seen as the cartesian product / concatenation of the actions taken during an episode.
- Besides these four algorithms, Michael proposed UCT with pre-discretization. I said that I already implemented a simple agent that pre-discretizes the action space (in a specified number of "parts") and uses a basic UCT algorithm to learn. I did this to test the MCTS / UCT methods / components which are also used in the regression trees.
- I announced that I have a job and will be working two days a week from now on. This means my productivity will slightly decrease.
Planning
- Implement (in Matlab) reading and visualizing the output from experiments
- Write everything relevant from this blog into the Latex draft already
- Think about the names of all 4 combinations of algorithms
- Implement TLS taking the regression tree component in a multi-step environment. Think about the representation of the sequence of the actions fed to the TLS component (and later to the HOLOP component)
- Wednesday, March 7, 2012, 11:00
- Andreas and Lukas are also invited to this meeting
No comments:
Post a Comment