Friday, May 25, 2012

Meeting 25-05-2012

Action Points
  1. We started with discussing the planning proposed at the beginning of the year. For me, I was pretty accurate, except that I did not yet implement the "transposition tree" and the writing which started a bit later. 
  2. We should use noise for the reward in stead of on the actions because if adding noise to the action can also change the optimal reward function, i.e. even when playing the best actions one would not be able to achieve the best reward. To draw the optimal line when having noisy actions you'd have to calculate products / intervals (to keep it general). 
  3. For multi step results I should also make a MTL-UCT agent
  4. I asked what to do with the state information of my "transposition tree". Michael proposed an idea and I brainstormed with Andreas afterwards:
    1. Assume the state/observation space to be Markov
    2. First level is an "observation tree", discretizing to observation space.
    3. Each leaf representing a region of this space links to an action tree on the second level, discretizing the action space and keeping information about which actions are best for the observation range from the level above.
    4. This observation-action tree can be re-used for each state the agent is in due the the property of 1.
  1. Change last experiment so that noise is on rewards.
  2. Add the MTL-UCT agent to the experiment
  3. Look into the global/perfect recall again
  4. Implement the "transposition tree"
  5. Writing

1 comment:

  1. Dear Colin,

    nice summary, your next actions and transposition tree planning looks good. Just as a thought about the 'perfect transposition tree': you could consider time as one discrete state variable and allow splitting there as well (i.e., if significant differences are observed). In this way, you would start assuming a markovian environment, and only complexify your planning if necessary.
    Success, Michael