 Presentation last Tuesday went fine. Kurt gave me one tip regarding the Transposition Table; I could mix the state and action space in stead of the level based approach. At a second though this should indeed be better, because splits then can only occur at leafs which means recalling is only done from the leaf to one of the two children (in stead of possibly "rebuilding" a whole new tree structure).
 The socalled Mixed TT is very similar to a regular Regression Tree. In stead of only splitting the action space it can also split the state space. During the selection step the agent chooses the best UCT child in case the node splits the action space and when at a node which splits on the state space the child is chosen according to the current state in the simulation. Each single action updates the tree in stead of a sequence of actions in case of TLS. The tree that is build can again be reused for later steps.
 I ran an experiment again for the new transposition tree and it comes very close to the best technique RMTL.
 When trying the Mixed TT in combination with HOO (i.e. HOO with state information) I found out this problem: since HOO splits on random dimensions it sometimes occurs that the very first split is on the action space. This means that no matter in which state the agent is in, is will always pick the region of the action space with the best UCT value (which could lead to bad actions if previously the other child was better)? This makes me wonder if the idea behind the Mixed TT is correct and why for regression trees is works so well (probably due to the intelligent splitting which initially splits on the state dimension representing the angle of the pole)?
 Furthermore, not very unimportant (!) I found an small error applicable to the HOO based algorithms. After fixing this and some parameter tuning (raising n_min), I achieved some better results. HOMTL is now reaches 50% succes rate but HOSTL is still computationally too expensive to plan ahead enough steps (the biggest problem is when the cart reaches the left or right "wall", then agent then has to force the pole's angle respectively to the right or left to get away from this wall).
Succes
rate

Average
payoff


RMTL

94%

971.747

RSTL

85%

922.047

HOMTL

50%

706.525

HOSTL

5%

320.255

Levelbased TT

65%

808.480

Mixed TT

87%

924.698

MC

9%

389.590
