- Fortunately, I had correctly implemented the splitting criteron. There were two reasons for the incorrect behaviour of the splitting.
- The parameters were not set correctly and changed them to match the parameters used for the experiments in [1]
- F-test significance level = 0.001 (I had set this too high (0.05) and therefore splits were introduced to early)
- The minimum amount of samples to collect before considering a split = 15
- MCTS exploration constant = 0.5 * (reward range size of parent node)
- Samples / statistics were completely discarted after a split to save memory. In my case, storing the samples would most likely be much more beneficial.
- We talked about the experimental implementations (see posts last week). Everything look fine, although I should add the options as described in the planning below.
- Results of an experiment I tried a couple of days ago showed that over time, there was sometimes a decrease in the reward when taking greedy actions. This is a flaw in the take greedy action method; storage of the best sample in each leaf should solve this problem.
- I will be on holiday from Wednesday February 15 to Tuesday February 21
- Store the best sample seen at each leaf of the regression tree which can be used when a greedy best action is requested
- For the RL Glue visualizations; it might be nice to color the leaf/action with the highest average reward in a different color
- Implement the memorization of samples (i.e. all attribute values and regression value of each sample) in each leaf. In case of a split these samples can be re-used by the two children (by re-inserting the samples at the parent)
- Think about and add the option to change above mentioned memorization, i.e.
- turn on memorization
- turn off memorization
- only memorize a certain number of samples
- only re-insert samples going to the best child
- etc.
- Implement the possibility to specify and read a certain properties file (instead of only a default file in the root folder, as of now)
- Implement (in Matlab) reading and visualizing the output from experiments (i.e. the generated .csv files; see scala code received from Kurt for reference)
- Run an experiment to compare with the results from the paper.
- Friday, Februari 24, 2012, 11:00
[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD
2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12, Sep 2011.
No comments:
Post a Comment