Colin Schepers' Blog: Feedback 06-02

Work

Implemented the distinction between "in head" and "real" actions. I did this by changing the environments to support being hard-copied so that agents can use this copy when asked for an action to perform by RL Glue. Maybe not the nicest way to do it, but the other option of running two environment instances in RL Glue had two problems:

RL Glue's manual stated that it is only possible to run one instance of agent and one instance of environment and even if I would be possible (since I use the Java source code of RL Glue) it would require a large amount of changes in the code.
Environments can only be reset to the starting state and not set to a specific state which is a problem after one or more steps in the real world. The "in head" siumulations would not start at the current state of the real world.

Possible to switch between both methods of FIMT and TG (binary tree and "first n examples")
Hoeffding bounds
Standard Deviation Reduction
Tau; tie breaking mechanism

Tested the regression tree with promising results using a one-step lookahead showing the automatic discretization of the action space for the current state the agent is in.

Problems

FIMT ranks attributes by their best split and splits the best attribute when there is enough evidence that the best attribute is better. What if there is only one attribute (one action dimension)? Should I compare the best and second best split for one attribute?
One paper states that TG uses a standard F-test to decide whether a split is probably significant and the other states that an F-test is used to decide which is the best split? In the latter case, what is SDR used for then in TG?

Planning

Colin Schepers' Blog