Monday, February 13, 2012

Meeting 13-02-2012

Action Points
  • Fortunately, I had correctly implemented the splitting criteron. There were two reasons for the incorrect behaviour of the splitting.
    • The parameters were not set correctly and changed them to match the parameters used for the experiments in [1]
      • F-test significance level = 0.001 (I had set this too high (0.05) and therefore splits were introduced to early)
      • The minimum amount of samples to collect before considering a split = 15
      • MCTS exploration constant = 0.5 * (reward range size of parent node)
    • Samples / statistics were completely discarted after a split to save memory. In my case, storing the samples would most likely be much more beneficial.
  • We talked about the experimental implementations (see posts last week). Everything look fine, although I should add the options as described in the planning below.
  • Results of an experiment I tried a couple of days ago showed that over time, there was sometimes a decrease in the reward when taking greedy actions. This is a flaw in the take greedy action method; storage of the best sample in each leaf should solve this problem. 
  • I will be on holiday from Wednesday February 15 to Tuesday February 21
Planning
  • Store the best sample seen at each leaf of the regression tree which can be used when a greedy best action is requested
  • For the RL Glue visualizations; it might be nice to color the leaf/action with the highest average reward in a different color
  • Implement the memorization of samples (i.e. all attribute values and regression value of each sample) in each leaf. In case of a split these samples can be re-used by the two children (by re-inserting the samples at the parent)
  • Think about and add the option to change above mentioned memorization, i.e.
    • turn on memorization
    • turn off memorization
    • only memorize a certain number of samples
    • only re-insert samples going to the best child
    • etc.  
  • Implement the possibility to specify and read a certain properties file (instead of only a default file in the root folder, as of now)
  • Implement (in Matlab) reading and visualizing the output from experiments (i.e. the generated .csv files; see scala code received from Kurt for reference)
  • Run an experiment to compare with the results from the paper.
Next Meeting
  • Friday, Februari 24, 2012, 11:00
[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD
2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12, Sep 2011.

No comments:

Post a Comment