Colin Schepers' Blog: Meeting 03-02

Action Points

As of now, I've implemented the agent in RL Glue not knowing the model of the environment and sampling by playing "real" actions. I should change this so that the agent knows the model and can sample "in his head". After a specified amount of samples, the agent returns a "real" action to RL Glue. I could either run two instances of RL Glue (in head and in real world) or give the agent acces to the environment classes.
In general, states (or observations) are not taken into consideration for TLS. In case of different starting states, one could add one depth to the tree where the root now accounts for deciding in which initial state the agent is in. The agent then starts sampling from this node.
An example of a VFDT for the context of TLS

A test is in the form of exampleValueY <= splitPoint
Incrementally updated as new examples arrive. 7 values have to be stored per test;

FIMT uses a binary tree representing all the examples
Other possibility proposed by Michael / Kurt and similar to TG's is to use the values of the first n examples as possible splitting point. After that, each new example only updates the statistics for these splitting points.

The R in the Hoeffding bounds formula represents the size of the range of the regression value; the reward range.

Tasks

Changes RL Glue structure so that agent can distinct between "in head" and "real" simulations.
Implement so that for each experiment, a new output folder is created containing (at least) the results and a file containing the values of all the constants used for that experiment. Possibly add being able to load this file.
Continue implementing regression trees, and make it possible to switch between the "binary tree" method of FIMT and the "first n examples" method (of TG).

Next Meeting

Colin Schepers' Blog