Monday, February 6, 2012

Feedback 06-02

Work

  • Implemented the distinction between "in head" and "real" actions. I did this by changing the environments to support being hard-copied so that agents can use this copy when asked for an action to perform by RL Glue. Maybe not the nicest way to do it, but the other option of running two environment instances in RL Glue had two problems:
    • RL Glue's manual stated that it is only possible to run one instance of agent and one instance of environment and even  if I would be possible (since I use the Java source code of RL Glue) it would require a large amount of changes in the code.
    • Environments can only be reset to the starting state and not set to a specific state which is a problem after one or more steps in the real world. The "in head" siumulations would not start at the current state of the real world.
  • I implemented regression tree induction
    • Possible to switch between both methods of FIMT and TG (binary tree and "first n examples")
    • Hoeffding bounds
    • Standard Deviation Reduction
    • Tau; tie breaking mechanism
  • Tested the regression tree with promising results using a one-step lookahead showing the automatic discretization of the action space for the current state the agent is in. 
Problems
  • Do not exactly understand which and when to split;
    • FIMT ranks attributes by their best split and splits the best attribute when there is enough evidence that the best attribute is better. What if there is only one attribute (one action dimension)? Should I compare the best and second best split for one attribute?
    • One paper states that TG uses a standard F-test to decide whether a split is probably significant and the other states that an F-test is used to decide which is the best split? In the latter case, what is SDR used for then in TG?
Planning
  • Implement test results output (see tasks of last meeting)
  • Look further into incremental regression tree induction

No comments:

Post a Comment