Colin Schepers' Blog: Feedback 15-02-2012

Work

Both the environment and agent visualization of RL Glue indicate the best sample and best regression tree leaf (indicated as a 1D or 2D area by their ranges) by the color green
All leafs in the regression tree remember the best sample seen which is picked in case the best greedy action is chosen
Implemented memorization of the samples adjustable by three parameters

I have looked into the Scala code and the algorithm is now able to converge to the global optimum within approximately 200 samples (as seen in the visualization);

A problem I noticed is that for the Six Hump Camel Back a lot of times a split was introduced at the very edge of the state space, leaving a group of only one sample. A minimum amount of samples a child should at least have in case of a split is introduced.
I don't know which significance test was used for the experiments in the paper, but in the Scala code the F-test is commented out and a T-test is used. I therefore also implemented a T-test so I could compare between both
Furthermore I noticed that the F-distribution table they were using showed incorrect values. I have no clue how they got to those values. I experimented with and have the option to use both value sets; their (0.1 and 0.001) tables and Apache's F-test method (generating the "correct" values according to the literature; for any significance level).
The most important change was the adaptive UCT constant. This makes a huge difference.

Planning

Holiday!
Implement the possibility to specify and read a certain properties file (instead of only a default file in the root folder, as of now)
Implement (in Matlab) reading and visualizing the output from experiments (i.e. the generated .csv files; see scala code received from Kurt for reference)
Run an experiment to compare with the results from the paper.

Colin Schepers' Blog