Friday, February 24, 2012

Meeting 24-02-2012


Action Points
  • I explained that I just fixed a bug regarding the adaptive UCT constant. During selection each child should take the same UCT constant (dependant on the range of the parent node), whereas at first I used the ranges of the children themselves meaning the children would use different UCT constants. In some cases a "bad" child would be preferred over a "good" child (with high average / low number of trails) when this bad child has a low minimum (and so a large range size).
  • I showed the behaviour and results in RL Glue and .cvs data; this looked good
  • Two problems are still open and are to be investigated (Kurt; thanks in advance)
    • From the Scala code it can be concluded that a T-test is used for the Six Hump Camel Back experiment, while the paper itself states an F-test is used?
    • The F-values used in the Scala code are of the entries 1 and n degrees of freedom of the F-Distribution table, while I though it would be more logical to use the entry n and n (because the degrees of freedom is equal for both sides)?
  • HOLOP and TLS and the combinations have been explained to me. On the one hand you have a "tree of trees" taking either a regression tree (as in TLS) or HOO and on the other hand you have a "tree of sequences" which can also either be in the form of a regression tree or HOO (as in HOLOP). A sequence can in this case be seen as the cartesian product / concatenation of the actions taken during an episode.
  • Besides these four algorithms, Michael proposed UCT with pre-discretization. I said that I already implemented a simple agent that pre-discretizes the action space (in a specified number of "parts") and uses a basic UCT algorithm to learn. I did this to test the MCTS / UCT methods / components which are also used in the regression trees.
  • I announced that I have a job and will be working two days a week from now on. This means my productivity will slightly decrease.
Planning
  • Implement (in Matlab) reading and visualizing the output from experiments
  • Write everything relevant from this blog into the Latex draft already
  • Think about the names of all 4 combinations of algorithms
  • Implement TLS taking the regression tree component in a multi-step environment. Think about the representation of the sequence of the actions fed to the TLS component (and later to the HOLOP component)
Next meeting
  • Wednesday, March 7, 2012, 11:00
  • Andreas and Lukas are also invited to this meeting

No comments:

Post a Comment