Colin Schepers' Blog: Meeting 13-02-2012

Action Points

Fortunately, I had correctly implemented the splitting criteron. There were two reasons for the incorrect behaviour of the splitting.

The parameters were not set correctly and changed them to match the parameters used for the experiments in [1]

F-test significance level = 0.001 (I had set this too high (0.05) and therefore splits were introduced to early)
The minimum amount of samples to collect before considering a split = 15
MCTS exploration constant = 0.5 * (reward range size of parent node)

Samples / statistics were completely discarted after a split to save memory. In my case, storing the samples would most likely be much more beneficial.

We talked about the experimental implementations (see posts last week). Everything look fine, although I should add the options as described in the planning below.
Results of an experiment I tried a couple of days ago showed that over time, there was sometimes a decrease in the reward when taking greedy actions. This is a flaw in the take greedy action method; storage of the best sample in each leaf should solve this problem.
I will be on holiday from Wednesday February 15 to Tuesday February 21

Planning

Store the best sample seen at each leaf of the regression tree which can be used when a greedy best action is requested
For the RL Glue visualizations; it might be nice to color the leaf/action with the highest average reward in a different color
Implement the memorization of samples (i.e. all attribute values and regression value of each sample) in each leaf. In case of a split these samples can be re-used by the two children (by re-inserting the samples at the parent)
Think about and add the option to change above mentioned memorization, i.e.

Implement the possibility to specify and read a certain properties file (instead of only a default file in the root folder, as of now)
Implement (in Matlab) reading and visualizing the output from experiments (i.e. the generated .csv files; see scala code received from Kurt for reference)
Run an experiment to compare with the results from the paper.

Next Meeting

[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD

2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12, Sep 2011.

Colin Schepers' Blog