Colin Schepers' Blog: January 2012

Monday, January 30, 2012

Feedback 30-01-2012

Finished the "Donut World" environment. Some small comments to discuss next meeting;

Agent can turn only for a limited range left or right (default 135.0 degrees); otherwise it could walk back and forth in stead of forward over the donut
If the agent would be located outside the bounds at the next step, its next location will be where it would touch the bounds. Other option could be: Game Over?
Several other options are configurable as parameters;

step limit
agent's step size
agent's maximum turn angle (in degrees)
agent's starting x, y, and angle
donut's x, y, radius and thickness
a single value reward or a gradual reward range when being on donut
amount of noise (observation and/or transition noise)

Here is again an overview of the environments that can be used to validate the algorithms. See references/weblinks for more information.

Sinus function optimization (upcoming; see [1])

1 dimensional continuous observation and action space
step limit = 1

Six hump camel back function optimization (see [1])

2 dimensional continuous observation and action space
step limit = 1

Donut world

3 dimensional continuous observation space (x, y and alpha of the agent)
1 dimensional continuous action space (amount of degree to turn left or right)

Cart pole (discrete actions variant: link)

4 dimensional continuous observation space
1 dimensional continuous action space

Helicopter hoovering

12 dimensional continuous observation space
4 dimensional continuous action space

Octopus arm

82 dimensional continuous observation space
32 dimensional continuous action space

Evaluation approach

All environments can be measured by the (cumulative) reward at each time step. These series of values can be benchmarked against other results. Results will be averaged for better approximations.
These rewards can be compared to other agents

Random agent
Lukas' agent(s)

Some environments are tested in literature which can be used for evaluation

The two function optimization functions are discussed in [1]
The Cart pole environment with continuous actions is discussed in [2]

[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD

2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12,

Sep 2011.
[2] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.

Thursday, January 26, 2012

Meeting 26-01-2012

Action Points

Discussed work done since last meeting (see previous posts)
I told that I spoke with Lukas about using RL Glue and that we would use the same environments
The "Donut World" is shown to me which is a simple continuous problem that can be used for testing

Agent starts in the "hole" of a donut-shaped region in a 2-dimensional continuous space
Agent recieves positive rewards for being in the donut region
1-dimensional actions indicate the angle of the agent
Agents moves forward with a fixed step size; just small enought to be able to stay in the donut region

Kurt mailed me a TLS implementation in Scala
Michael offered me a Java implementation of HOO
I signed the thesis plan which will be evaluated and forwarded by Kurt and Michael

Tasks

Think about how to evaluate the agents (i.e. reward, steps, mean, std, etc.)
Implement "Donut World" environment for RL Glue
Find existing RL Glue agents to benchmark against
Continue on topic Data Stream Mining / Regression trees

Next Meeting

Friday, Februari 03, 2012, 11:00

Tuesday, January 24, 2012

Feedback 24-01-2012

Work

Changed framework to personal preference and conveniences
Now have four environments available for testing with continuous states and actions

Six Hump Camel Back
Cart Pole
Helicopter Hoovering
Octopus Arm

Saturday, January 21, 2012

Feedback 21-01-2012

Work

Made changes according to comments from last meeting to thesis plan
Read several more papers
Research about available frameworks

RL Glue; supports Java, C/C++, Matlab, Python, Lisp
Maja Machine Learning Framework; Python
Reinforcement Learning Toolbox; C++
Several small environments (mostly in Matlab or Python)

Personal preference for RL GLue

Supports my preferred language; Java
Possibly best support of the framework options above
Multi-platform / -language
Used in the last RL Competitions
Online library with environments, agents and experiments

Installed RL Glue

Created Netbeans Java project sourcing all neccesary libraries
Created helper class to start up RL Viz, RL Glue, agent, environment and experiment
Changed build.xml to automatically build everything and run (in one click)
Downloaded RandomAgent, CartPole and MountainCar from rl-library for testing purposes
Created new environment; one shot - Six-Hump Camel Back

Plans

Investigate RL Glue environment more
Finish Six Hump Camel Back One shot environment (visualization)
Find environments taking continuous actions (not in rl-library; perhaps from last year's RL competitions)
Change Cart Pole environment to take continuous actions

Tuesday, January 17, 2012

Meeting 16-01-2012

Thesis Plan Comments / Tasks

Come up with a preliminary title
Split the research question about the component replacements into two questions
Put research question about learning the generative model at the end; becomes optional for the research
Put research question about the re-use of information for TLS at the end; becomes optional for the research
Make a Gantt chart for the planning; more detailed

Additional Tasks

Set up a method for communication (this blog!)
Read more literature, i.e. Multi-Resulution Exploration in Continuous Spaces (Nouri & Littman)
Think of about the implementation details; i.e. plan, programming language, etc.
Think about which testbed / benchmarking to use, i.e. RL Glue, function approximation, etc.

Next Meeting

Thursday, January 26, 2012, 11:00