Monday, January 30, 2012

Feedback 30-01-2012


  • Finished the "Donut World" environment. Some small comments to discuss next meeting;
    • Agent can turn only for a limited range left or right (default 135.0 degrees); otherwise it could walk back and forth in stead of forward over the donut
    • If the agent would be located outside the bounds at the next step, its next location will be where it would touch the bounds. Other option could be: Game Over?
    • Several other options are configurable as parameters; 
      • step limit
      • agent's step size 
      • agent's maximum turn angle (in degrees)
      • agent's starting x, y, and angle
      • donut's x, y, radius and thickness
      • a single value reward or a gradual reward range when being on donut
      • amount of noise (observation and/or transition noise)
  • Here is again an overview of the environments that can be used to validate the algorithms. See references/weblinks for more information.
    • Sinus function optimization (upcoming; see [1])
      • 1 dimensional continuous observation and action space
      • step limit = 1
    • Six hump camel back function optimization (see [1])
      • 2 dimensional continuous observation and action space
      • step limit = 1
    • Donut world
      • 3 dimensional continuous observation space (x, y and alpha of the agent)
      • 1 dimensional continuous action space (amount of degree to turn left or right)
    • Cart pole (discrete actions variant: link)
      • 4 dimensional continuous observation space
      • 1 dimensional continuous action space
    • Helicopter hoovering
      • 12 dimensional continuous observation space
      • 4 dimensional continuous action space
    • Octopus arm
      • 82 dimensional continuous observation space
      • 32 dimensional continuous action space
  • Evaluation approach
    • All environments can be measured by the (cumulative) reward at each time step. These series of values can be benchmarked against other results. Results will be averaged for better approximations.
    • These rewards can be compared to other agents
      • Random agent
      • Lukas' agent(s)
    • Some environments are tested in literature which can be used for evaluation
      • The two function optimization functions are discussed in [1]
      • The Cart pole environment with continuous actions is discussed in [2]
[1] G. Van den Broeck and K. Driessens, “Automatic discretization of actions and states in Monte-Carlo tree search,” in Proceedings of the ECML/PKDD
2011 Workshop on Machine Learning and Data Mining in and around Games (T. Croonenborghs, K. Driessens, and O. Missura, eds.), pp. 1–12,
Sep 2011.
[2] H. Van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” in Approximate Dynamic Programming and Reinforcement Learning 2007 ADPRL 2007 IEEE International Symposium on, no. Adprl, pp. 272–279, 2007.

Thursday, January 26, 2012

Meeting 26-01-2012

Action Points
  • Discussed work done since last meeting (see previous posts)
  • I told that I spoke with Lukas about using RL Glue and that we would use the same environments
  • The "Donut World" is shown to me which is a simple continuous problem that can be used for testing
    • Agent starts in the "hole" of a donut-shaped region in a 2-dimensional continuous space
    • Agent recieves positive rewards for being in the donut region
    • 1-dimensional actions indicate the angle of the agent
    • Agents moves forward with a fixed step size; just small enought to be able to stay in the donut region
  • Kurt mailed me a TLS implementation in Scala 
  • Michael offered me a Java implementation of HOO
  • I signed the thesis plan which will be evaluated and forwarded by Kurt and Michael
Tasks
  • Think about how to evaluate the agents (i.e. reward, steps, mean, std, etc.)
  • Implement "Donut World" environment for RL Glue
  • Find existing RL Glue agents to benchmark against
  • Continue on topic Data Stream Mining / Regression trees
Next Meeting

  • Friday, Februari 03, 2012, 11:00 

Tuesday, January 24, 2012

Feedback 24-01-2012

Work

  • Changed framework to personal preference and conveniences
  • Now have four environments available for testing with continuous states and actions
    • Six Hump Camel Back
    • Cart Pole
    • Helicopter Hoovering
    • Octopus Arm

Saturday, January 21, 2012

Feedback 21-01-2012


Work
  • Made changes according to comments from last meeting to thesis plan
  • Read several more papers
  • Research about available frameworks
    • RL Glue; supports Java, C/C++, Matlab, Python, Lisp
    • Maja Machine Learning Framework; Python
    • Reinforcement Learning Toolbox; C++
    • Several small environments (mostly in Matlab or Python)
  • Personal preference for RL GLue
    • Supports my preferred language; Java
    • Possibly best support of the framework options above
    • Multi-platform / -language
    • Used in the last RL Competitions
    • Online library with environments, agents and experiments
  • Installed RL Glue
    • Created Netbeans Java project sourcing all neccesary libraries
    • Created helper class to start up RL Viz, RL Glue, agent, environment and experiment
    • Changed build.xml to automatically build everything and run (in one click)
    • Downloaded RandomAgent, CartPole and MountainCar from rl-library for testing purposes
    • Created new environment; one shot - Six-Hump Camel Back 
Plans
  • Investigate RL Glue environment more
  • Finish Six Hump Camel Back One shot environment (visualization)
  • Find environments taking continuous actions (not in rl-library; perhaps from last year's RL competitions)
  • Change Cart Pole environment to take continuous actions

Tuesday, January 17, 2012

Meeting 16-01-2012

Thesis Plan Comments / Tasks
  • Come up with a preliminary title
  • Split the research question about the component replacements into two questions
  • Put research question about learning the generative model at the end; becomes optional for the research
  • Put research question about the re-use of information for TLS at the end; becomes optional for the research
  • Make a Gantt chart for the planning; more detailed
Additional Tasks
  • Set up a method for communication (this blog!)
  • Read more literature, i.e. Multi-Resulution Exploration in Continuous Spaces (Nouri & Littman)
  • Think of about the implementation details; i.e. plan, programming language, etc.
  • Think about which testbed / benchmarking to use, i.e. RL Glue, function approximation, etc.
Next Meeting
  • Thursday, January 26, 2012, 11:00