- Both the environment and agent visualization of RL Glue indicate the best sample and best regression tree leaf (indicated as a 1D or 2D area by their ranges) by the color green
- All leafs in the regression tree remember the best sample seen which is picked in case the best greedy action is chosen
- Implemented memorization of the samples adjustable by three parameters
- Maximum number of samples to store at any time in a leaf
- Maximum number of samples to pass to best child in case of a split
- Maximum number of samples to pass to worst child in case of a split
- I have looked into the Scala code and the algorithm is now able to converge to the global optimum within approximately 200 samples (as seen in the visualization);
- A problem I noticed is that for the Six Hump Camel Back a lot of times a split was introduced at the very edge of the state space, leaving a group of only one sample. A minimum amount of samples a child should at least have in case of a split is introduced.
- I don't know which significance test was used for the experiments in the paper, but in the Scala code the F-test is commented out and a T-test is used. I therefore also implemented a T-test so I could compare between both
- Furthermore I noticed that the F-distribution table they were using showed incorrect values. I have no clue how they got to those values. I experimented with and have the option to use both value sets; their (0.1 and 0.001) tables and Apache's F-test method (generating the "correct" values according to the literature; for any significance level).
- The most important change was the adaptive UCT constant. This makes a huge difference.
Planning
- Holiday!
- Implement the possibility to specify and read a certain properties file (instead of only a default file in the root folder, as of now)
- Implement (in Matlab) reading and visualizing the output from experiments (i.e. the generated .csv files; see scala code received from Kurt for reference)
- Run an experiment to compare with the results from the paper.
No comments:
Post a Comment