Colin Schepers' Blog: Feedback 05-03-2012

Monday, March 5, 2012

Feedback 05-03-2012

Work

Set up structure (chapters / sections) of the report
Wrote few blocks of text for report
Though about the names for the 4 combinations (Regression Tree / Hoo + Meta Tree / Sequence Tree)

Regression-based Meta Tree Learning (RMTL)

similar to Tree Learning Search (TLS)

Hierarchical Optimistic Sequence-based Tree Learning (HOSTL)

similar to Hierarchical Open-Loop Optimistic Planning (HOLOP)

Hierarchical Optimistic Meta Tree Learning (HOMTL)
Regression and Sequence-based Tree Learning (RSTL)

While thinking about the meta tree interface for next meeting I already had a go on the implementation resulting in a working RMTL / TLS agent (so it seems)
Below are some results for a multi-step problem using RMTL / TLS

Environment: Donut World
average of 100 episodes
3 step limit
10,000 simulations per step
maximum reward per step = 1 (when being exactly on the middle of the donut region)
gradual reward degrease to 0 towards the (inner or outer) edge of the donut
Minimum reward = 0 (off the donut)

Step	Average reward	Minimum reward	Maximum reward
1	0.977023	0.888396	0.999574
2	0.983432	0.768632	0.999977
3	0.993064	0.62132	1

Average cumulative reward	Minimum cumulative reward	Maximum cumulative reward
2.95352	2.555052	2.998965

Planning

Preparation for meeting Wednesday March 7th
Report writing
Debug / investigate TLS / RMTL

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)