This work was inspired by thinking about play in Breakthrough, and the current implementation applies fairly narrowly to games that exhibit certain constraints (see below) and can be described as 'breakthrough-like' in some sense. However, extension to a wider category of games is something that I will be coming back to after this year's championships.
Overview
Breakthrough exhibits locality of effect for moves in the following senses:
1) No goal disjunct is influenced by multiple moves (the goals in fact can be expresssed precisely as disjuncts of propositions that each individually arise from a single move). Hence moves in different parts of the board cannot directly influence the same goal (or terminality)
2) Each move impacts only a small number of base propositions, and thence of legals.
These constraints allow us to statically calculate a useful distance metric on the space of moves, wherein the distance between two moves is the minimum number of turns that must go by before both can impact the same base proposition (or goal/terminality disjunct, though that will be the same thing in the constrained set of games we currently seek to apply this to).
Given such a distance metric we can observe that for any sequence of N moves by one player (interspersed with arbitrary responses by the opponent) that results in a terminal state then either:
i) All N moves are within a region of diameter N in the above metric space; or
ii) There exists some N' < N for which such a sequence of N' moves is also terminal
We can say a path to terminality is efficient if it is of minimal length (for arbitrary opponent responses). Informally suppose you have a pawn 2 spaces from the end rank in a game of Breakthrough, and no enemy pawns positioned to stop it (and no quick wins of their own). Then the efficient sequences would be those in which this pawn were advanced on each move (so the efficient path length would be 2). Playing other moves along the way would only lead to longer paths, and so such moves are irrelevant.
We can then define a set of N-efficient paths, to be those paths of length N that are efficient, and this set will be a subset of the set of paths in which all moves are within a region of diameter N.
This allows us to define a search algorithm to search for forced wins within N moves, by restricting the move selection to the efficient set, which trims the search space from the set of all moves (in our example of a pawn 2 moves from the end rank, any search that begins by advancing it can ignore [on the second move selection] all moves that are further than 2 away in our distance metric from the first move, which essentially mean the search will typically ignore almost all but the actual winning move(s) in that case).
Furthermore, since we will only look for N-efficient wins (so binary result) we can prune aggressively in the opponent ply - as soon as any opponent move is identified that avoids a win for us, the parent node of that opponent choice can be trimmed.
By searching separately for forced wins for each player (independently) we can prune each individual search extensively, and apply the efficient-sequence trimming using iterated deepening to search for reasonably shallow forced wins. The details become quite complex, but the constraints implied by the notion of efficient sequences, can be tightened considerably to constrain the opponent replies considered at each stage, and to progressively shrink the region diameter as the search ply increases within each iteratively deepened search. In practice (using one thread for this activity, in a way discussed a bit more below) we generally see effective end-game search sequences of lengths around 12 for Breakthrough(small) and 9 or 10 for Breakthrough (15 second per move play time being what I typically use in testing). This is easily enough to fix the kinds of endgame blunders MCTS tends to make, and leads to a considerable strengthening in play quality.
Current use in Sancho
In Sancho we enable this system for games that have no goal-coupling of moves (that is to say that no one goal disjunct is dependent on more than one move) and for which the static distance analysis provides average move distances above a certain threshold (games in which the pieces are highly mobile produce much smaller distances, which in turn allows less pruning since far more sequences are efficient, so this threshold serves to filter those games out).
One thread serves requests for local search from a given start state with a given initial 'seed' move to determine the locality. The main MCTS search thread periodically issues a new local search request for 'interesting' (state,move) pairs, which are currently:
- The current state and the last move played
- The state resulting from the move we think most likely to be played next, and that move
This is re-evaluated on a timescale of a few seconds and the search request updated accordingly (so local search is asked to look for forced wins in the branches MCTS thinks are most probable). If results are found before the search target is changed, they are fed back to the MCTS searcher and applied as a local-search-status to the nodes concerned (local win, local loss, no known local result). MCTS selection heavily discounts local losses, and boosts local wins. It does NOT prune based on them, because they are actually only strong hints (a local win in N moves from a particular start point does not guarantee there is not a local loss at lower depth from a different non-local start point). Because it applies a fixed discount to local losses, if local search concludes that ALL moves are losses, the effect is to discount all moves equally, leaving us back with the original MCTS move weighting. This is important in practice because knowing you SHOULD lose does not mean you should not do your best to avoid it (encouraging the opponent to blunder), so reverting to MCTS evaluation in this case maximizes the chances of recovering via sub-optimal opponent play.
An interesting feature that drops out from the nature of the search, is that if you consider only moves within a certain radius, you run the risk of introducing an artificial zugzwang, whereby all the local moves are actually bad, and the correct move is a non-local one (this actually only happens in the ply of the opponent relative to the role you are looking for forced wins for). To account for that we have to introduce a virtual tenuki to the search. A direct side effect of this is that we always have a search result for the opponent tenuki-ing the first move we're being asked to search with respect to. This means that if the MCTS searcher asks the local search to examine a particular move, a possible result is that although the move is not a forced win (within the depth iterated to) it can identify that it IS a forced win if the opponent tenukis, and we can thus conclude that all opponent moves outside the search radius are local losses (and mark them accordingly in the MCTS tree).
Another interesting observation is that the bandwidth required to communicate between the MCTS search and the local searcher (and the sensitivity to latency) is low (new requests and responses typically are issued on a timescale of seconds and require perhaps O(1K) data). This makes parallelization (or even distribution across machines) of local search an attractive area for future consideration, allowing the main MCTS search to request more 'interesting' states to be examined.
Acknowledgements
Finally a pointer to some related work and acknowledgements - the distance metric defined here is not the only one possible. Another useful distance that one could consider is the distance of a move from a terminal state (i.e. - the shortest number of turns before a terminal state can result). Previous work has been done by Daniel Michulke and Stephan Schiffel (see Distance Features for General Game Playing) that utilizes goal-distance metrics to prune and direct search.
The use of sequence efficiency (as defined by a move-move distance and outlined above) and other distance-based pruning techniques (specifically goal distance, certainly) are mostly orthogonal, and it is clear that the two techniques should be highly synergistic.
Future directions
I'm not entirely sure when I'll come back to this work, but at some point I intend to, and there are many directions to take it in. Below is a list of the ones I currently think most interesting.
- Combine with goal-distance metrics (for example, in Breakthrough, no pawn move to a rank that is still > N ranks from the final rank can possibly be part of an N-efficient winning sequence, so this allows orthogonal pruning)
- Extend to games that exhibit goal coupling - consider the larger varieties of Connect4 as examples here - because each goal disjunct consists of 4 base props, which can become set by 4 different moves, move distances of drops in columns within 4 of one another are all 1, which leads to poor pruning due to insufficient locality. However, by imposing an artificial 'focus' on the sequences considered by local search (adjacent moves in a sequence impose a restricted distance constraint) we can effectively get local search to consider reduced vertical 'strips' of the board. Such focus constraints are empirically effective, but are inherently heuristic in nature, so work is required to determine when it is appropriate to use them and how to (automatically) tune them,
- Distribute local search across machines
No comments:
Post a Comment