- Looked into the perfect recall and fixed it. RMTL is able to balance for 1000 steps 94% (vs 69% before). I've updated the table from a couple of posts back and for also provided it below in this post.
- Implemented the transposition tree and works pretty well. For the CartPole environment it performs better than Random, MC, HOMTL and HOSTL (see table). Offline it does not very good and probably lacks information about important states (not provided in the table).
- I removed the number of simulations from the table as in my opinion they were not really comparable anymore since I changed the parameters for HOO so it could achieve more simulations (but less information). Furthermore, the Transposition Tree Agent actually updates the tree structure each step (and not each simulation). Meaning that after 1 rollout, it can actually update 20 times (depending on the length of the rollout). Therefore I chose to remove them out of the table.
- I rerun the noisy Donut World experiment but now with reward noise. I appeared that there is no significant decrease in performance caused by this noise. Furthermore, I tried the transposition tree agent for this environment and it performed better than all other algorithms.
Succes
rate
|
Average
payoff
|
|
IRTI + TLS
|
94%
|
971.747
|
IRTI + HOLOP
|
85%
|
922.047
|
HOO + TLS
|
0%
|
77.254
|
HOO + HOLOP
|
3%
|
273.525
|
Transposition Tree
|
65%
|
808.480
|
MC
|
9%
|
389.590
|
No comments:
Post a Comment