The reason for this is that about 4 weeks ago I undertook a reworking of the MCTS tree's representation internally to Sancho, to reduce memory usage, and remove some role-asymmetry that didn't seem right. The catalyst for doing this was to set the groundwork for an enhanced move-sequence search which will be the subject of a future post (at least if it works!). I expected it to take 3 or 4 days, but it soon turned out to be a sticky morass, from which escape was more difficult than envisaged!
The core of the change was to eliminate non-decision nodes in the tree (forced moves, forced noops, etc.), leaving only the nodes at which actual decisions have to be made. Since Sancho's tree considers choices for a single role at any one node (not joint moves), this meant that in most games (non-simultaneous move games) decision layers were interposed with non-decision (forced noop) layers (one in N layers at most would have a decision, where there are N roles in the game). This was both wasteful of space (tree node allocation), and introduced role asymmetry because the state only changes every Nth layer of the tree (where a full joint move is implied by the choices in the preceding N layers).
In the new structure all non-decision nodes are eliminated, and (as a side-effect) an edge always leads between nodes of different game states. The semantics of the representation however, should be equivalent.
Stabilizing this change took roughly the expected time, but unexpectedly it turned out to have a distinctly damaging impact on the quality of play in games that employed heuristics (primarily games with a piece heuristic active, such as breakthrough, or chess). The reason for this is not totally clear, but almost certainly has to do with the role asymmetry, which the original heuristic implementation was 'comfortable' with.
Since then I have been fiddling with the mechanisms by which heuristic values are blended into the MCTS tree, to find a modification that work well in the new structure. Over the past 3 weeks or so I have found various approaches that initially worked well in my tests with one game, only to find that they performed badly in another. The three games I have mostly been using to test (as they display somewhat different characteristics) were:
- Breakthrough (simple equal-value piece heuristic, fixed sum)
- Speed chess (variable value piece heuristic, fixed sum)
- Skirmish (variable value piece heuristic, non-fixed sum)
In particular I found that what worked well in Breakthrough (where material exchanges are typically somewhat binary in nature) didn't work well in the other two (and visa vera).
However, as of yesterday I am now getting positive test results (but much more testing remains to be done) in all of the above games, so hopefully this time the light at the end of the tunnel is not another oncoming train!
A ton of regression testing now needs to be done, for which I plan to use the following games:
- Reversi (uses a heuristic other than piece-based)
- Connect4 (goto non-heuristic fixed-sum 2-player game)
- Max knights (puzzle)
- Sudoku (very different puzzle)
- Three player free for all (more than 2 players)
- Blocker (simultaneous turn)
- Pentago (non alternating role choice sequence)
- Breakthrough with walls (heuristic and factored)
This will take a few more days, even if no problems are found. After that (I hope), normal service will be resumed...