Sunday, July 12, 2015

Local Search in Breakthrough-like games

Finally I have time to write the long-promised follow-up to my last post, which is an example of using an approximate factorization technique to improve play.

This work was inspired by thinking about play in Breakthrough, and the current implementation applies fairly narrowly to games that exhibit certain constraints (see below) and can be described as 'breakthrough-like' in some sense.  However, extension to a wider category of games is something that I will be coming back to after this year's championships.

Overview

Breakthrough exhibits locality of effect for moves in the following senses:

1) No goal disjunct is influenced by multiple moves (the goals in fact can be expresssed precisely as disjuncts of propositions that each individually arise from a single move).  Hence moves in different parts of the board cannot directly influence the same goal (or terminality)

2) Each move impacts only a small number of base propositions, and thence of legals.

These constraints allow us to statically calculate a useful distance metric on the space of moves, wherein the distance between two moves is the minimum number of turns that must go by before both can impact the same base proposition (or goal/terminality disjunct, though that will be the same thing in the constrained set of games we currently seek to apply this to).

Given such a distance metric we can observe that for any sequence of N moves by one player (interspersed with arbitrary responses by the opponent) that results in a terminal state then either:

i) All N moves are within a region of diameter N in the above metric space; or
ii) There exists some N' < N for which such a sequence of N' moves is also terminal

We can say a path to terminality is efficient if it is of minimal length (for arbitrary opponent responses).  Informally suppose you have a pawn 2 spaces from the end rank in a game of Breakthrough, and no enemy pawns positioned to stop it (and no quick wins of their own).  Then the efficient sequences would be those in which this pawn were advanced on each move (so the efficient path length would be 2).  Playing other moves along the way would only lead to longer paths, and so such moves are irrelevant.

We can then define a set of N-efficient paths, to be those paths of length N that are efficient, and this set will be a subset of the set of paths in which all moves are within a region of diameter N.

This allows us to define a search algorithm to search for forced wins within N moves, by restricting the move selection to the efficient set, which trims the search space from the set of all moves (in our example of a pawn 2 moves from the end rank, any search that begins by advancing it can ignore [on the second move selection] all moves that are further than 2 away in our distance metric from the first move, which essentially mean the search will typically ignore almost all but the actual winning move(s) in that case).

Furthermore, since we will only look for N-efficient wins (so binary result) we can prune aggressively in the opponent ply - as soon as any opponent move is identified that avoids a win for us, the parent node of that opponent choice can be trimmed.

By searching separately for forced wins for each player (independently) we can prune each individual search extensively, and apply the efficient-sequence trimming using iterated deepening to search for reasonably shallow forced wins.  The details become quite complex, but the constraints implied by the notion of efficient sequences, can be tightened considerably to constrain the opponent replies considered at each stage, and to progressively shrink the region diameter as the search ply increases within each iteratively deepened search.  In practice (using one thread for this activity, in a way discussed a bit more below) we generally see effective end-game search sequences of lengths around 12 for Breakthrough(small) and 9 or 10 for Breakthrough (15 second per move play time being what I typically use in testing).  This is easily enough to fix the kinds of endgame blunders MCTS tends to make, and leads to a considerable strengthening in play quality.

Current use in Sancho

In Sancho we enable this system for games that have no goal-coupling of moves (that is to say that no one goal disjunct is dependent on more than one move) and for which the static distance analysis provides average move distances above a certain threshold (games in which the pieces are highly mobile produce much smaller distances, which in turn allows less pruning since far more sequences are efficient, so this threshold serves to filter those games out).

One thread serves requests for local search from a given start state with a given initial 'seed' move to determine the locality.  The main MCTS search thread periodically issues a new local search request for 'interesting' (state,move) pairs, which are currently:

  • The current state and the last move played
  • The state resulting from the move we think most likely to be played next, and that move
This is re-evaluated on a timescale of a few seconds and the search request updated accordingly (so local search is asked to look for forced wins in the branches MCTS thinks are most probable).  If results are found before the search target is changed, they are fed back to the MCTS searcher and applied as a local-search-status to the nodes concerned (local win, local loss, no known local result).  MCTS selection heavily discounts local losses, and boosts local wins.  It does NOT prune based on them, because they are actually only strong hints (a local win in N moves from a particular start point does not guarantee there is not a local loss at lower depth from a different non-local start point).  Because it applies a fixed discount to local losses, if local search concludes that ALL moves are losses, the effect is to discount all moves equally, leaving us back with the original MCTS move weighting.  This is important in practice because knowing you SHOULD lose does not mean you should not do your best to avoid it (encouraging the opponent to blunder), so reverting to MCTS evaluation in this case maximizes the chances of recovering via sub-optimal opponent play.

An interesting feature that drops out from the nature of the search, is that if you consider only moves within a certain radius, you run the risk of introducing an artificial zugzwang, whereby all the local moves are actually bad, and the correct move is a non-local one (this actually only happens in the ply of the opponent relative to the role you are looking for forced wins for).  To account for that we have to introduce a virtual tenuki to the search.  A direct side effect of this is that we always have a search result for the opponent tenuki-ing the first move we're being asked to search with respect to.  This means that if the MCTS searcher asks the local search to examine a particular move, a possible result is that although the move is not a forced win (within the depth iterated to) it can identify that it IS a forced win if the opponent tenukis, and we can thus conclude that all opponent moves outside the search radius are local losses (and mark them accordingly in the MCTS tree).

Another interesting observation is that the bandwidth required to communicate between the MCTS search and the local searcher (and the sensitivity to latency) is low (new requests and responses typically are issued on a timescale of seconds and require perhaps O(1K) data).  This makes parallelization (or even distribution across machines) of local search an attractive area for future consideration, allowing the main MCTS search to request more 'interesting' states to be examined.

Acknowledgements

Finally a pointer to some related work and acknowledgements - the distance metric defined here is not the only one possible.  Another useful distance that one could consider is the distance of a move from a terminal state (i.e. - the shortest number of turns before a terminal state can result).  Previous work has been done by Daniel Michulke and Stephan Schiffel (see Distance Features for General Game Playing) that utilizes goal-distance metrics to prune and direct search.

The use of sequence efficiency (as defined by a move-move distance and outlined above) and other distance-based pruning techniques (specifically goal distance, certainly) are mostly orthogonal, and it is clear that the two techniques should be highly synergistic.

Future directions

I'm not entirely sure when I'll come back to this work, but at some point I intend to, and there are many directions to take it in.  Below is a list of the ones I currently think most interesting.

  • Combine with goal-distance metrics (for example, in Breakthrough, no pawn move to a rank that is still > N ranks from the final rank can possibly be part of an N-efficient winning sequence, so this allows orthogonal pruning)
  • Extend to games that exhibit goal coupling - consider the larger varieties of Connect4 as examples here - because each goal disjunct consists of 4 base props, which can become set by 4 different moves, move distances of drops in columns within 4 of one another are all 1, which leads to poor pruning due to insufficient locality.  However, by imposing an artificial 'focus' on the sequences considered by local search (adjacent moves in a sequence impose a restricted distance constraint) we can effectively get local search to consider reduced vertical 'strips' of the board.  Such focus constraints are empirically effective, but are inherently heuristic in nature, so work is required to determine when it is appropriate to use them and how to (automatically) tune them,
  • Distribute local search across machines

Sunday, April 5, 2015

Through a Murky Lense

I've been meaning to write this post for ages (it's largely based on enhancements made late last year), but never seemed to get around to it. Well, with the new run of the Coursera GGP course beginning last week, I really wanted to get back to making regular updates again, so now seems a good time to finally get to this one!

The general message of this post is that you don't necessarily need to play exactly the game you are given! Playing something that is an approximation of it will often suffice, provided the approximation is something you can play better than the exact game, and usually it gives the right results. I'll discuss some different types of approximation, some of the more adventurous of which I may well come back to in future posts, but the main thrust of the thesis is that a player that plays well in a game that usually has the same semantics as the actual game it is supposed to be playing, can often beat a completely 'correct' player that doesn't play as well within the domain of the approximation, provided that most of the time actual play stays in that domain.

There are probably many categories of approaches to approximation that can be taken, but I'll talk here about just two, one of which is implemented (for some cases) by the current version of Sancho, and the other of which I'm currently working on some restricted cases of.

State machine feature emulation

Often we are given game rules that manifest as a very expensive state machine, and/or a state machine that is hard to reason about in order to perform feature analysis. A typical example is something like Reversi, which has a horribly expensive goals network, which slows down tree expansion. Fairly straight-forward analysis can reveal logic (and base propositions) that have no impact on move legality, and basic logic structures which suggest possible forms of the goal calculation (e.g. - successor-like logic that suggests counting is going on), and these clues can suggest hypotheses for the goal forms. Based on these clues, one can come up with a set of hypothetical goal calculations, and then during meta-gaming measure their accuracy across random games. If you find a hypothetical goal calculation that appears to match what the game actually delivers via a full state-machine implementation, one can then decide to play with an emulation of the goals logic external to the animation of the state machine itself (and remove all goal logic from the underlying state-machine).

In Sancho we use this to detect goal structures of the forms:

  • Game is a straight-forward win/loss/draw based on who holds the majority of some set of base props
  • Goal value is cardinality of some set of base props whose value is true in the current state
  • Game is a straight-forward win/loss/draw based on the previous type but interpreting the values it produces as inputs to a who-has-more calculation (Reversi is of this kind - basically are there more white pieces than black pieces or visa versa)
  • As any of the above but where the count is represented via a set of successor counting props in the game state (Dots&Boxes looks like this as I recall)
If it can hypothesize a probable goal calculation, and match it to one of the above then Sancho will strip goal logic from the state-machine and run an emulator that directly evaluates the calculations.  This has two benefits:
  1. It typically results in significantly faster tree expansion, since next state and goal calculations involve less logic
  2. Once we decide to believe in one of the above forms we can use it to draw further inferences about the game which can be useful.  Examples are:
    • If a we can determine that a count cannot go backwards (e.g. - in the successor logic there might be no path that decrements a count, or in a majority-of-propositions formulation where all of the propositions being counted are latches), and the game's final result is a win/loss based on a majority of a fixed size set, then the game's result is fully determined once one role has more than half.  This allows us to add some emulation to the terminal logic also, and declare the game to be virtually terminal (for search purposes) in states where the outcome is fully determined.  This allows Sancho to cease searching in Dots&Boxes when one player has 13 boxes for example.
    • If a count is based on the cardinality of some set of base propositions, and some of those propositions are known to be latches, then we can infer a possible heuristic, that it is good to get those latched propositions set.  In Sancho we then feed the resulting heuristic through the same statistical verification we use to determine generally which heuristics should be turned on (previously discussed when talking about Piece detection heuristics).  This turns out to provide Sancho with a heuristic that corners are good to own in Reversi for example.
Approximate Factorization and Local Search

A second interesting area, is that of games that exhibit some sort of 'locality' of action.  That is to say, games in which it is possible to define a measure of distance, such that base propositions can be mapped onto a metric space, where the distance measure reflects the number of steps before changes to one proposition can result in changes to another.  This relates closely to factorization, in that in a factorizable game the distances will divide the space of base propositions (possibly after some processing to remove 'control logic' such as whose turn it is) into disconnected subsets.

The fully factorizable case I've discussed in a previous post, and leads to factorization of the game into sub-games, which are then independently searchable.

However, cases that are not fully factorizable might still be able to make use of the distance information.  Two obvious uses suggest themselves:

Firstly, if a game exhibits localized neighbourhoods, that are strongly connected internally, but weakly connected externally, then we could consider making sub-games that consist only of those neighbourhoods (separately) and searching them as if they were true factors.  As a thought-experiment consider a game like Risk.  What armies are placed, and move, in one part of the world has little impact on distant parts of the world for many turns.  Thus we could treat regions of the board separately (searching tactical possibilities in Asia without worrying about South America for instance).  Provided we can weave the results of the sub-game searches back together in some useful way (typically goals might not exhibit the same locality that legality and base props do), it will be much cheaper to search the subgames than to globally search the entire gamespace.  Note that subgames need not be disjoint, so we could consider overlapping regions and take the move from the one with the best local result for instance (discounted by the worst local penalty for not playing in the other subgames).

Secondly, we can search from a given state only within a fixed neighbourhood (say everything that could be be influenced within a certain number of steps).  If we assume (and we can determine this analytically by looking at the terminal/goal conditions for the game) that two moves are only meaningful in the same move sequence if they can both influence something commonly within the restricted neighbourhood, then we can restrict the choice of moves we look at within the search.  For example, imagine a Sheep&Wolf like game, but taking place in a maze, where one player with several pieces (like the sheep) is aiming to trap an opponent piece (like the wolf), and wins if they do.  Search sequences (up to a given length) which feature moving sheep which are further away from one another than the neighbourhood diameter are pointless, as two such moves cannot both be relevant to a successful capture (within a number of moves equal to the neighbourhood diameter).  Hence we can perform restricted search, such that choosing to move a certain sheep as the first move in the sequence constrains later choices, and greatly reduces the effective branching factor.  The downside of this is that the search results are only valid up to a depth of the neighbourhood radius.  We can use an iterative deepening approach to address this.

Currently Sancho does not implement approximate factorization, but an experimental version is under development that does implement local search.  I will (hopefully) have a lot more to say about this in a future post!

Saturday, January 31, 2015

AAAI15

The 29th Conference on Artifical Intelligence (AAAI15) was held in Austin last week (see http://www.aaai.org/Conferences/AAAI/aaai15.php), and since that's where I live I was able to spend some time there meeting GGP researchers who were attending, and in particular the Stanford and New South Wales teams under Professors Genesereth and Thielsche (respectively), who are behind the Coursera GGP course, which initially introduced me to the field.

I had many interesting conversations, which have left me with boosted enthusiasm for getting back to making further progress on both Sancho, and on the exploration of some new directions (which I will probably follow independently of Sancho to begin with, though they may re-integrate further down the road).

During the conference a 'grudge match' (honestly - there's no grudge!) between Sancho and TurboTurtle was held (I don't have a link for any of the games, as we did it in a fairly ad-hoc fashion, playing games people present suggested), which I think Sancho won (though I don't recall any exact score - it was more a demo-match session).  The following day, a human vs silicon competition (just a couple of games) was also held (Sancho vs attendee-victim), which Sancho won 2:0, though it should have lost the first game, as its opponent had a fairly short forced win at one point (in Breakthrough).

On the final day a brief award ceremony took place, where I received the GGP International Championship cup following last year's win (traditionally this takes place at the AAAI conferences), on behalf of Andrew and myself.

The cup in new hands:

Professor Genesereth (the handsome guy without much hair standing next to the other handsom guy without much hair!):

Tuesday, December 9, 2014

Kalaha

Last Christmas, my father-in-law gave me a copy of Mancala (also known as Kalah or Kalaha).  I've played it a few dozen times, but my 7 year old daughter still beats me regularly!

So I wondered about coding up the rules in GDL to see how Sancho plays it.  As it transpires, a quick look in the base repository revealed that somebody had already done this back in 2009.  (This game isn't on Tiltyard rotation because it doesn't contain explicit definitions of the base and input propositions.)

Enthusiastically, I fired up Sancho and got it playing.  Sadly, it only managed ~100 iterations/turn.  (We've made some significant general improvements since then, but Kalaha is still an order of magnitude slower even than a complicated game like Chess.)  At the time, I made some tweaks to the GDL which improved the speed approximately three-fold, but that still wasn't enough for a good game.

So, it has sat on the back burner for nearly a year.  Over the last couple of days, I've been digging into it in a bit more detail because I have some moderately revolutionary ideas for dramatic improvements.  (Hopefully more in a further blog post.)

For now, I've produced an annotated version of the GDL which you may like to peruse.  (You'll definitely want to understand the basic rules of the game first though - see the link above.)

Sunday, September 28, 2014

Caught in the tar pit

I have been a bit remiss in making regular posts recently, and as some may have noticed, Sancho has also been absent from Tiltyard for nearly a month.

The reason for this is that about 4 weeks ago I undertook a reworking of the MCTS tree's representation internally to Sancho, to reduce memory usage, and remove some role-asymmetry that didn't seem right.  The catalyst for doing this was to set the groundwork for an enhanced move-sequence search which will be the subject of a future post (at least if it works!).  I expected it to take 3 or 4 days, but it soon turned out to be a sticky morass, from which escape was more difficult than envisaged!

The core of the change was to eliminate non-decision nodes in the tree (forced moves, forced noops, etc.), leaving only the nodes at which actual decisions have to be made.  Since Sancho's tree considers choices for a single role at any one node (not joint moves), this meant that in most games (non-simultaneous move games) decision layers were interposed with non-decision (forced noop) layers (one in N layers at most would have a decision, where there are N roles in the game).  This was both wasteful of space (tree node allocation), and introduced role asymmetry because the state only changes every Nth layer of the tree (where a full joint move is implied by the choices in the preceding N layers).

In the new structure all non-decision nodes are eliminated, and (as a side-effect) an edge always leads between nodes of different game states.  The semantics of the representation however, should be equivalent.

Stabilizing this change took roughly the expected time, but unexpectedly it turned out to have a distinctly damaging impact on the quality of play in games that employed heuristics (primarily games with a piece heuristic active, such as breakthrough, or chess).  The reason for this is not totally clear, but almost certainly has to do with the role asymmetry, which the original heuristic implementation was 'comfortable' with.

Since then I have been fiddling with the mechanisms by which heuristic values are blended into the MCTS tree, to find a modification that work well in the new structure.  Over the past 3 weeks or so I have found various approaches that initially worked well in my tests with one game, only to find that they performed badly in another.  The three games I have mostly been using to test (as they display somewhat different characteristics) were:

  • Breakthrough (simple equal-value piece heuristic, fixed sum)
  • Speed chess (variable value piece heuristic, fixed sum)
  • Skirmish (variable value piece heuristic, non-fixed sum)

In particular I found that what worked well in Breakthrough (where material exchanges are typically somewhat binary in nature) didn't work well in the other two (and visa vera).

However, as of yesterday I am now getting positive test results (but much more testing remains to be done) in all of the above games, so hopefully this time the light at the end of the tunnel is not another oncoming train!

A ton of regression testing now needs to be done, for which I plan to use the following games:
  • Reversi (uses a heuristic other than piece-based)
  • Connect4 (goto non-heuristic fixed-sum 2-player game)
  • Max knights (puzzle)
  • Sudoku (very different puzzle)
  • Three player free for all (more than 2 players)
  • Blocker (simultaneous turn)
  • Pentago (non alternating role choice sequence)
  • Breakthrough with walls (heuristic and factored)

This will take a few more days, even if no problems are found.  After that (I hope), normal service will be resumed...

Wednesday, August 20, 2014

Balanced diet

(Who ate all the pies?)

In the International General Game Playing Championship 2014, it struck me that many of the games that were played had significant bias for one player (usually, but not always, the first).  In an attempt to compensate, many games were played both ways round, often with the predictable outcome.  This was unfortunate because it increased the amount of time required to play games, reduced the excitement (by having predictable outcomes) and effectively reduced the best-of-3 rounds to a best-of-one (two matches of a game with a predictable outcome and then the decider - usually a game where the bias is unknown because it isn't played on Tiltyard and therefore statistics aren't available).

Whilst reading up on various abstract strategy games, I was introduced to the concept of the Pie rule.  Based on the traditional method for children to divide a cake ("you cut, I'll choose"), it aims to balance two player games by giving the second player the opportunity to swap places with the first player after the first move.  This way, the first player has an incentive to play the most balanced initial play the he possibly can.  (If he makes an opening move that is too strong, the second player will swap with him and get the strong position.  If he plays an opening move that is too weak, the second player will leave him with it.)

In the last few days, I have created Pie-rule variants of 9-board Tic-Tac-Toe and Hex.  (I've also created a visualization for regular Hex so that it can be played on Tiltyard.)  It's early days yet, but I notice that the pie rule seems to be doing the trick for 9BTTT.  In the first 16 matches played on the Tiltyard, 9 went with the first player and 7 with the second.  Whilst it's still a small sample size, that's looking substantially more balanced than the regular version.

So, for the GDL authors out there (by which I suppose I mean Alex!), consider the Pie rule for creating balance in your game.

For everybody else, what games would you like to see a Pie rule variant of?  Whilst I'm certainly not making any promises, if you post in the comments, I'll consider doing them.

Sunday, August 17, 2014

Goal Latches

Latches are properties of a game's state which, once they become true, are guaranteed to remain true throughout the remainder of the game (positively latched); or which, once they become false, are guaranteed to remain false (negatively latched).  A 'property' in this sense can be any logical function of the game's base state - most typically we talk about the simplest case of this, which is the state of an individual base proposition.

Identifying latches can be useful in several ways:

  • It may allow reduced computational effort in calculating state transitions (e.g. - if a part of the propnet is effectively 'dark' due to a latch condition, then once that latch is detected that part of the propnet can be left uncalculated in future state transitions).
  • Similarly it may allow reduced computational effort in heuristic calculation
  • If the latch constrains the achievable goals then it can be used to optimize search

This post is about the use Sancho makes of the last case - i.e. - situations in which it detects that the achievable score range has been reduced by latching of one or more of the goal propositions (at most one for each role can become positively latched, in which case the score for that role is then fully determined in all possible paths, but several may become independently negatively latched, which serves to reduce the size of the set of achievable scores).

Latch detection might be analytic or heuristic in nature.  I'll come back to heuristic latch detection in a future post - for this one I'm only considering analytic detection, though the use to which the result is put can be the same in either case.

Analytic latch detection is based on logical reasoning about the structure of the game.  In Sancho's case this is done by analysis of the propnet, and currently we only identify fairly simple cases in which a base proposition feeds back to itself via logic of one of the following forms (where X is the base prop, and X' is its next value on state transition):

X' = X OR <other condition>  (X is positively latched)

or

X' = ~X AND <other condition> (X is negatively latched)

Furthermore, if a goal proposition is of the form

G = X AND <other condition>, and X is negatively latched, then G is said to be negatively latched by X.

Similarly for the positive latch case:

G = X OR <other condition>

If we can identify such goal latches, we know that the set of achievable scores in any state of the game reachable from a state with latched goal propositions is either fully determined (positively latched goal), or is strictly reduced from the set of all possible goal values to some subset.

In Sancho this surfaces as a method on the state machine which returns the achievable goal range for any given state, as [min,max].  This range can be used in the MCTS search tree processing in several ways:
  • If min == max, then all possible paths result in the same score.  If this occurs for our role then whatever we play from that point on can have no impact on the eventual outcome.  Consequently we can treat any tree node in which this occurs as if it were terminal, and not search below it (the one exception being that if this occurs on the root node we expand one level just so that we have legal moves to choose from when asked to return a move)
  • If a terminal state is encountered with a score of the max achievable, we can invoke tree trimming by propagating upwards when the role deciding on the move at this branch is the one with the max-achievable score (i.e. - this amounts to a 'win' relative to the parent state and no further searching is needed since it cannot be improved upon)
  • If a move reduces the achievable range (either at the top or at the bottom) then we can trivially define a heuristic to favor search on paths that increase the min achievable score, and dis-favor those that decrease the max achievable.
Because latch detection is typically very cheap (at search time, once the analysis has been done during meta-gaming) we can also use the change in achievable score as a heuristic during playouts to guide otherwise random playouts along paths that tend to maximize the expected goal value, on the assumption (within the playout) that all achievable scores are equally likely.  This can most simply be done by simply down-weighting moves that decrease the choosing role's max score, and up-weighting those that increase the min.  In zero-sum games, up-weighting those that decrease the max opponent role scores, and down-weighting those that increase their min scores can also be applied.  Currently we do this VERY crudely and handle the 'decrease weight' cases by preventing those moves being selected at all in the presence of any other choice!

Some examples help illustrate where these techniques give benefit, so I'll discuss a few in the following paragraphs.

The Untwisty Corridor Family

This family of games are basically just test cases for latch detection.  They are puzzles involving traversing a maze (logically speaking anyway - none of them actually have visualization currently so far as I am aware).  Any step onto the wrong path sets a 'failure' proposition in the game's state, and the game terminates after a fixed number of moves.  Consequently you must make the 'correct' choice for each of a fixed number of steps (in 'untwisty corridor' you need to go straight every move, hence the name).  Because a wrong choice sets a latched proposition, which ultimately determines the goal values, this is a simple latched-goal case, where any wrong move immediately fully determines the final score (as 0).  Because all such states are immediately treated as if they were terminal, and not searched, the effect is that the MCTS search only ever visits nodes one step off the correct branch, which reduced the size of the search space from being exponential in number of steps to being linear.
In fact, the basic untwisty corridor is somewhat flawed, because all false choices lead to the SAME state (apart from the step counter), so it is trivially solved by transposition, provided the player maintains a transposition table of some sort.  Furthermore, at only 6 steps, it is small enough to solve by brute force anyway!  The game 'untwistycomplex2' is an attempt to address these issues, and a better test.

Escort Latch Breakthrough

The game Escort Latch Breakthrough is a variant of Breakthrough where each role has a king and 8 pawns.  The goal is to get the King to the far end of the board.  If both kings are captured the game is a draw.  A typical MCTS player, without latch analysis, will usually move a pawn from in front of their king and then just advance the king right up to one rank before it can be captured by the opponent's pawns.  At that point it's usually fairly trivial for the opponent to force capture it (kings cannot move backwards) and games between such players tend to end up as mutual king exchanges and therefore draws (for example, this game).  The reason this happens is that the vulnerability of the king has very little impact on the ensemble of playout scores, because almost all playouts fail to bother capturing the kings, and MCTS convergence is slowed by the need to achieve convergence in subtrees following king capture.
With latch detection a king capture is seen as an achievable score range change (from [0,100] to [50,100] or [0,50] depending on which king is captured).  This impacts the search in several ways:
  • In states where one king has been captured, capturing the other is terminal, and can be propagated as if it were a win (even though it scores only 50), because it is the best achievable result
  • Moves that capture/avoid capture of a king are heuristically selected more frequently, which speeds convergence
  • Playouts are much more reasonable, since they do not include games in which a king is NOT captured if it could be (i.e. - where a king just marches through a defending line of pawns without being molested).  This dramatically improves the quality of Monte Carol samples generated by the playouts.  This gain far outweighs the cost in reduced playout count that the evaluation requires (which is actually quite substantial since all moves must be examined at each stage)
The result is that rather than optimistically advancing their king into danger, the player tends to hold it back, and work to capture its opponent's king first (see this game for example)