- Set up structure (chapters / sections) of the report
- Wrote few blocks of text for report
- Though about the names for the 4 combinations (Regression Tree / Hoo + Meta Tree / Sequence Tree)
- Regression-based Meta Tree Learning (RMTL)
- similar to Tree Learning Search (TLS)
- Hierarchical Optimistic Sequence-based Tree Learning (HOSTL)
- similar to Hierarchical Open-Loop Optimistic Planning (HOLOP)
- Hierarchical Optimistic Meta Tree Learning (HOMTL)
- Regression and Sequence-based Tree Learning (RSTL)
- While thinking about the meta tree interface for next meeting I already had a go on the implementation resulting in a working RMTL / TLS agent (so it seems)
- Below are some results for a multi-step problem using RMTL / TLS
- Environment: Donut World
- average of 100 episodes
- 3 step limit
- 10,000 simulations per step
- maximum reward per step = 1 (when being exactly on the middle of the donut region)
- gradual reward degrease to 0 towards the (inner or outer) edge of the donut
- Minimum reward = 0 (off the donut)
Step
|
Average reward
|
Minimum reward
|
Maximum reward
|
1
|
0.977023
|
0.888396
|
0.999574
|
2
|
0.983432
|
0.768632
|
0.999977
|
3
|
0.993064
|
0.62132
|
1
|
Average cumulative reward
|
Minimum cumulative reward
|
Maximum cumulative reward
|
2.95352
|
2.555052
|
2.998965
|
- Preparation for meeting Wednesday March 7th
- Report writing
- Debug / investigate TLS / RMTL
No comments:
Post a Comment