Andreas, Lukas and me presented the results of the experiments (see previous posts for my results). I've got some usefull feedback and comment, which are summarized in the planning below.
Planning
- Investigate why the IRTI line horizontal at the end of the Six Hump Camel Back Experiments.
- Implement a visualization to show at which time a split occurs (at which sample number).
- Check if it is correct that the RMTL agent has a significant decrease in number of simulations per second when memorization is enabled, as seen in the table of the last post.
- Think which parameters are relevant (to mention in the report) and remove the redundant.
- Optimize the multi-step agents
- Think of a graph representation of the multi-step experiments and rerun them under more difficult settings, i.e. less time or the settings from the paper (with noisy rewards, noisy actions, etc).
- Implement perfect recall on meta tree level
- Implement Transposition Tree
No comments:
Post a Comment