Friday, February 3, 2012

Meeting 03-02


Action Points
  • As of now, I've implemented the agent in RL Glue not knowing the model of the environment and sampling by playing "real" actions. I should change this so that the agent knows the model and can sample "in his head". After a specified amount of samples, the agent returns a "real" action to RL Glue. I could either run two instances of RL Glue (in head and in real world) or give the agent acces to the environment classes.
  • In general, states (or observations) are not taken into consideration for TLS. In case of different starting states, one could add one depth to the tree where the root now accounts for deciding in which initial state the agent is in. The agent then starts sampling from this node.
  • An example of a VFDT for the context of TLS
    • Attribute values (X) = action values (one per action dimension)
    • Regression value (value to predict; y) = reward 
  •  Choosing split points for a continuous attribute
    • A test is in the form of exampleValueY <= splitPoint
    • Incrementally updated as new examples arrive. 7 values have to be stored per test;
      • split point value
      • number of examples that pass / fail the test 
      • sum of y values that pass / fail  the test 
      • sum of squared y values that pass / fail the test 
    • FIMT uses a binary tree representing all the examples
    • Other possibility proposed by Michael / Kurt and similar to TG's is to use the values of the first n examples as possible splitting point. After that, each new example only updates the statistics for these splitting points. 
  • The R in the Hoeffding bounds formula represents the size of the range of the regression value; the reward range.
Tasks
  • Changes RL Glue structure so that agent can distinct between "in head" and "real" simulations.
  • Implement so that for each experiment, a new output folder is created containing (at least) the results and a file containing the values of all the constants used for that experiment. Possibly add being able to load this file.
  • Continue implementing regression trees, and make it possible to switch between the "binary tree" method of FIMT and the "first n examples" method (of TG).
Next Meeting
  • Monday, Februari 13, 2012, 13:00 

No comments:

Post a Comment