Sancho GGP Player: 2015

Sunday, July 12, 2015

Local Search in Breakthrough-like games

Finally I have time to write the long-promised follow-up to my last post, which is an example of using an approximate factorization technique to improve play.

This work was inspired by thinking about play in Breakthrough, and the current implementation applies fairly narrowly to games that exhibit certain constraints (see below) and can be described as 'breakthrough-like' in some sense. However, extension to a wider category of games is something that I will be coming back to after this year's championships.

Overview

Breakthrough exhibits locality of effect for moves in the following senses:

1) No goal disjunct is influenced by multiple moves (the goals in fact can be expresssed precisely as disjuncts of propositions that each individually arise from a single move). Hence moves in different parts of the board cannot directly influence the same goal (or terminality)

2) Each move impacts only a small number of base propositions, and thence of legals.

These constraints allow us to statically calculate a useful distance metric on the space of moves, wherein the distance between two moves is the minimum number of turns that must go by before both can impact the same base proposition (or goal/terminality disjunct, though that will be the same thing in the constrained set of games we currently seek to apply this to).

Given such a distance metric we can observe that for any sequence of N moves by one player (interspersed with arbitrary responses by the opponent) that results in a terminal state then either:

i) All N moves are within a region of diameter N in the above metric space; or
ii) There exists some N' < N for which such a sequence of N' moves is also terminal

We can say a path to terminality is efficient if it is of minimal length (for arbitrary opponent responses). Informally suppose you have a pawn 2 spaces from the end rank in a game of Breakthrough, and no enemy pawns positioned to stop it (and no quick wins of their own). Then the efficient sequences would be those in which this pawn were advanced on each move (so the efficient path length would be 2). Playing other moves along the way would only lead to longer paths, and so such moves are irrelevant.

We can then define a set of N-efficient paths, to be those paths of length N that are efficient, and this set will be a subset of the set of paths in which all moves are within a region of diameter N.

This allows us to define a search algorithm to search for forced wins within N moves, by restricting the move selection to the efficient set, which trims the search space from the set of all moves (in our example of a pawn 2 moves from the end rank, any search that begins by advancing it can ignore [on the second move selection] all moves that are further than 2 away in our distance metric from the first move, which essentially mean the search will typically ignore almost all but the actual winning move(s) in that case).

Furthermore, since we will only look for N-efficient wins (so binary result) we can prune aggressively in the opponent ply - as soon as any opponent move is identified that avoids a win for us, the parent node of that opponent choice can be trimmed.

By searching separately for forced wins for each player (independently) we can prune each individual search extensively, and apply the efficient-sequence trimming using iterated deepening to search for reasonably shallow forced wins. The details become quite complex, but the constraints implied by the notion of efficient sequences, can be tightened considerably to constrain the opponent replies considered at each stage, and to progressively shrink the region diameter as the search ply increases within each iteratively deepened search. In practice (using one thread for this activity, in a way discussed a bit more below) we generally see effective end-game search sequences of lengths around 12 for Breakthrough(small) and 9 or 10 for Breakthrough (15 second per move play time being what I typically use in testing). This is easily enough to fix the kinds of endgame blunders MCTS tends to make, and leads to a considerable strengthening in play quality.

Current use in Sancho

In Sancho we enable this system for games that have no goal-coupling of moves (that is to say that no one goal disjunct is dependent on more than one move) and for which the static distance analysis provides average move distances above a certain threshold (games in which the pieces are highly mobile produce much smaller distances, which in turn allows less pruning since far more sequences are efficient, so this threshold serves to filter those games out).

One thread serves requests for local search from a given start state with a given initial 'seed' move to determine the locality. The main MCTS search thread periodically issues a new local search request for 'interesting' (state,move) pairs, which are currently:

The current state and the last move played
The state resulting from the move we think most likely to be played next, and that move

This is re-evaluated on a timescale of a few seconds and the search request updated accordingly (so local search is asked to look for forced wins in the branches MCTS thinks are most probable). If results are found before the search target is changed, they are fed back to the MCTS searcher and applied as a local-search-status to the nodes concerned (local win, local loss, no known local result). MCTS selection heavily discounts local losses, and boosts local wins. It does NOT prune based on them, because they are actually only strong hints (a local win in N moves from a particular start point does not guarantee there is not a local loss at lower depth from a different non-local start point). Because it applies a fixed discount to local losses, if local search concludes that ALL moves are losses, the effect is to discount all moves equally, leaving us back with the original MCTS move weighting. This is important in practice because knowing you SHOULD lose does not mean you should not do your best to avoid it (encouraging the opponent to blunder), so reverting to MCTS evaluation in this case maximizes the chances of recovering via sub-optimal opponent play.

An interesting feature that drops out from the nature of the search, is that if you consider only moves within a certain radius, you run the risk of introducing an artificial zugzwang, whereby all the local moves are actually bad, and the correct move is a non-local one (this actually only happens in the ply of the opponent relative to the role you are looking for forced wins for). To account for that we have to introduce a virtual tenuki to the search. A direct side effect of this is that we always have a search result for the opponent tenuki-ing the first move we're being asked to search with respect to. This means that if the MCTS searcher asks the local search to examine a particular move, a possible result is that although the move is not a forced win (within the depth iterated to) it can identify that it IS a forced win if the opponent tenukis, and we can thus conclude that all opponent moves outside the search radius are local losses (and mark them accordingly in the MCTS tree).

Another interesting observation is that the bandwidth required to communicate between the MCTS search and the local searcher (and the sensitivity to latency) is low (new requests and responses typically are issued on a timescale of seconds and require perhaps O(1K) data). This makes parallelization (or even distribution across machines) of local search an attractive area for future consideration, allowing the main MCTS search to request more 'interesting' states to be examined.

Acknowledgements

Finally a pointer to some related work and acknowledgements - the distance metric defined here is not the only one possible. Another useful distance that one could consider is the distance of a move from a terminal state (i.e. - the shortest number of turns before a terminal state can result). Previous work has been done by Daniel Michulke and Stephan Schiffel (see Distance Features for General Game Playing) that utilizes goal-distance metrics to prune and direct search.

The use of sequence efficiency (as defined by a move-move distance and outlined above) and other distance-based pruning techniques (specifically goal distance, certainly) are mostly orthogonal, and it is clear that the two techniques should be highly synergistic.

Future directions

I'm not entirely sure when I'll come back to this work, but at some point I intend to, and there are many directions to take it in. Below is a list of the ones I currently think most interesting.

Combine with goal-distance metrics (for example, in Breakthrough, no pawn move to a rank that is still > N ranks from the final rank can possibly be part of an N-efficient winning sequence, so this allows orthogonal pruning)
Extend to games that exhibit goal coupling - consider the larger varieties of Connect4 as examples here - because each goal disjunct consists of 4 base props, which can become set by 4 different moves, move distances of drops in columns within 4 of one another are all 1, which leads to poor pruning due to insufficient locality. However, by imposing an artificial 'focus' on the sequences considered by local search (adjacent moves in a sequence impose a restricted distance constraint) we can effectively get local search to consider reduced vertical 'strips' of the board. Such focus constraints are empirically effective, but are inherently heuristic in nature, so work is required to determine when it is appropriate to use them and how to (automatically) tune them,
Distribute local search across machines

Sunday, April 5, 2015

Through a Murky Lense

I've been meaning to write this post for ages (it's largely based on enhancements made late last year), but never seemed to get around to it. Well, with the new run of the Coursera GGP course beginning last week, I really wanted to get back to making regular updates again, so now seems a good time to finally get to this one!

The general message of this post is that you don't necessarily need to play exactly the game you are given! Playing something that is an approximation of it will often suffice, provided the approximation is something you can play better than the exact game, and usually it gives the right results. I'll discuss some different types of approximation, some of the more adventurous of which I may well come back to in future posts, but the main thrust of the thesis is that a player that plays well in a game that usually has the same semantics as the actual game it is supposed to be playing, can often beat a completely 'correct' player that doesn't play as well within the domain of the approximation, provided that most of the time actual play stays in that domain.

There are probably many categories of approaches to approximation that can be taken, but I'll talk here about just two, one of which is implemented (for some cases) by the current version of Sancho, and the other of which I'm currently working on some restricted cases of.

State machine feature emulation

Often we are given game rules that manifest as a very expensive state machine, and/or a state machine that is hard to reason about in order to perform feature analysis. A typical example is something like Reversi, which has a horribly expensive goals network, which slows down tree expansion. Fairly straight-forward analysis can reveal logic (and base propositions) that have no impact on move legality, and basic logic structures which suggest possible forms of the goal calculation (e.g. - successor-like logic that suggests counting is going on), and these clues can suggest hypotheses for the goal forms. Based on these clues, one can come up with a set of hypothetical goal calculations, and then during meta-gaming measure their accuracy across random games. If you find a hypothetical goal calculation that appears to match what the game actually delivers via a full state-machine implementation, one can then decide to play with an emulation of the goals logic external to the animation of the state machine itself (and remove all goal logic from the underlying state-machine).

In Sancho we use this to detect goal structures of the forms:

Game is a straight-forward win/loss/draw based on who holds the majority of some set of base props
Goal value is cardinality of some set of base props whose value is true in the current state
Game is a straight-forward win/loss/draw based on the previous type but interpreting the values it produces as inputs to a who-has-more calculation (Reversi is of this kind - basically are there more white pieces than black pieces or visa versa)
As any of the above but where the count is represented via a set of successor counting props in the game state (Dots&Boxes looks like this as I recall)

If it can hypothesize a probable goal calculation, and match it to one of the above then Sancho will strip goal logic from the state-machine and run an emulator that directly evaluates the calculations. This has two benefits:

It typically results in significantly faster tree expansion, since next state and goal calculations involve less logic
Once we decide to believe in one of the above forms we can use it to draw further inferences about the game which can be useful. Examples are:
- If a we can determine that a count cannot go backwards (e.g. - in the successor logic there might be no path that decrements a count, or in a majority-of-propositions formulation where all of the propositions being counted are latches), and the game's final result is a win/loss based on a majority of a fixed size set, then the game's result is fully determined once one role has more than half. This allows us to add some emulation to the terminal logic also, and declare the game to be virtually terminal (for search purposes) in states where the outcome is fully determined. This allows Sancho to cease searching in Dots&Boxes when one player has 13 boxes for example.
- If a count is based on the cardinality of some set of base propositions, and some of those propositions are known to be latches, then we can infer a possible heuristic, that it is good to get those latched propositions set. In Sancho we then feed the resulting heuristic through the same statistical verification we use to determine generally which heuristics should be turned on (previously discussed when talking about Piece detection heuristics). This turns out to provide Sancho with a heuristic that corners are good to own in Reversi for example.

Approximate Factorization and Local Search

A second interesting area, is that of games that exhibit some sort of 'locality' of action. That is to say, games in which it is possible to define a measure of distance, such that base propositions can be mapped onto a metric space, where the distance measure reflects the number of steps before changes to one proposition can result in changes to another. This relates closely to factorization, in that in a factorizable game the distances will divide the space of base propositions (possibly after some processing to remove 'control logic' such as whose turn it is) into disconnected subsets.

The fully factorizable case I've discussed in a previous post, and leads to factorization of the game into sub-games, which are then independently searchable.

However, cases that are not fully factorizable might still be able to make use of the distance information. Two obvious uses suggest themselves:

Firstly, if a game exhibits localized neighbourhoods, that are strongly connected internally, but weakly connected externally, then we could consider making sub-games that consist only of those neighbourhoods (separately) and searching them as if they were true factors. As a thought-experiment consider a game like Risk. What armies are placed, and move, in one part of the world has little impact on distant parts of the world for many turns. Thus we could treat regions of the board separately (searching tactical possibilities in Asia without worrying about South America for instance). Provided we can weave the results of the sub-game searches back together in some useful way (typically goals might not exhibit the same locality that legality and base props do), it will be much cheaper to search the subgames than to globally search the entire gamespace. Note that subgames need not be disjoint, so we could consider overlapping regions and take the move from the one with the best local result for instance (discounted by the worst local penalty for not playing in the other subgames).

Secondly, we can search from a given state only within a fixed neighbourhood (say everything that could be be influenced within a certain number of steps). If we assume (and we can determine this analytically by looking at the terminal/goal conditions for the game) that two moves are only meaningful in the same move sequence if they can both influence something commonly within the restricted neighbourhood, then we can restrict the choice of moves we look at within the search. For example, imagine a Sheep&Wolf like game, but taking place in a maze, where one player with several pieces (like the sheep) is aiming to trap an opponent piece (like the wolf), and wins if they do. Search sequences (up to a given length) which feature moving sheep which are further away from one another than the neighbourhood diameter are pointless, as two such moves cannot both be relevant to a successful capture (within a number of moves equal to the neighbourhood diameter). Hence we can perform restricted search, such that choosing to move a certain sheep as the first move in the sequence constrains later choices, and greatly reduces the effective branching factor. The downside of this is that the search results are only valid up to a depth of the neighbourhood radius. We can use an iterative deepening approach to address this.

Currently Sancho does not implement approximate factorization, but an experimental version is under development that does implement local search. I will (hopefully) have a lot more to say about this in a future post!

Saturday, January 31, 2015

AAAI15

The 29th Conference on Artifical Intelligence (AAAI15) was held in Austin last week (see http://www.aaai.org/Conferences/AAAI/aaai15.php), and since that's where I live I was able to spend some time there meeting GGP researchers who were attending, and in particular the Stanford and New South Wales teams under Professors Genesereth and Thielsche (respectively), who are behind the Coursera GGP course, which initially introduced me to the field.

I had many interesting conversations, which have left me with boosted enthusiasm for getting back to making further progress on both Sancho, and on the exploration of some new directions (which I will probably follow independently of Sancho to begin with, though they may re-integrate further down the road).

During the conference a 'grudge match' (honestly - there's no grudge!) between Sancho and TurboTurtle was held (I don't have a link for any of the games, as we did it in a fairly ad-hoc fashion, playing games people present suggested), which I think Sancho won (though I don't recall any exact score - it was more a demo-match session). The following day, a human vs silicon competition (just a couple of games) was also held (Sancho vs attendee-victim), which Sancho won 2:0, though it should have lost the first game, as its opponent had a fairly short forced win at one point (in Breakthrough).

On the final day a brief award ceremony took place, where I received the GGP International Championship cup following last year's win (traditionally this takes place at the AAAI conferences), on behalf of Andrew and myself.

The cup in new hands:

Professor Genesereth (the handsome guy without much hair standing next to the other handsom guy without much hair!):