Sancho GGP Player: Goal Latches

Latches are properties of a game's state which, once they become true, are guaranteed to remain true throughout the remainder of the game (positively latched); or which, once they become false, are guaranteed to remain false (negatively latched). A 'property' in this sense can be any logical function of the game's base state - most typically we talk about the simplest case of this, which is the state of an individual base proposition.

Identifying latches can be useful in several ways:

It may allow reduced computational effort in calculating state transitions (e.g. - if a part of the propnet is effectively 'dark' due to a latch condition, then once that latch is detected that part of the propnet can be left uncalculated in future state transitions).
Similarly it may allow reduced computational effort in heuristic calculation
If the latch constrains the achievable goals then it can be used to optimize search

This post is about the use Sancho makes of the last case - i.e. - situations in which it detects that the achievable score range has been reduced by latching of one or more of the goal propositions (at most one for each role can become positively latched, in which case the score for that role is then fully determined in all possible paths, but several may become independently negatively latched, which serves to reduce the size of the set of achievable scores).

Latch detection might be analytic or heuristic in nature. I'll come back to heuristic latch detection in a future post - for this one I'm only considering analytic detection, though the use to which the result is put can be the same in either case.

Analytic latch detection is based on logical reasoning about the structure of the game. In Sancho's case this is done by analysis of the propnet, and currently we only identify fairly simple cases in which a base proposition feeds back to itself via logic of one of the following forms (where X is the base prop, and X' is its next value on state transition):

X' = X OR <other condition> (X is positively latched)

X' = ~X AND <other condition> (X is negatively latched)

Furthermore, if a goal proposition is of the form

G = X AND <other condition>, and X is negatively latched, then G is said to be negatively latched by X.

Similarly for the positive latch case:

G = X OR <other condition>

If we can identify such goal latches, we know that the set of achievable scores in any state of the game reachable from a state with latched goal propositions is either fully determined (positively latched goal), or is strictly reduced from the set of all possible goal values to some subset.

In Sancho this surfaces as a method on the state machine which returns the achievable goal range for any given state, as [min,max]. This range can be used in the MCTS search tree processing in several ways:

If min == max, then all possible paths result in the same score. If this occurs for our role then whatever we play from that point on can have no impact on the eventual outcome. Consequently we can treat any tree node in which this occurs as if it were terminal, and not search below it (the one exception being that if this occurs on the root node we expand one level just so that we have legal moves to choose from when asked to return a move)
If a terminal state is encountered with a score of the max achievable, we can invoke tree trimming by propagating upwards when the role deciding on the move at this branch is the one with the max-achievable score (i.e. - this amounts to a 'win' relative to the parent state and no further searching is needed since it cannot be improved upon)
If a move reduces the achievable range (either at the top or at the bottom) then we can trivially define a heuristic to favor search on paths that increase the min achievable score, and dis-favor those that decrease the max achievable.

Because latch detection is typically very cheap (at search time, once the analysis has been done during meta-gaming) we can also use the change in achievable score as a heuristic during playouts to guide otherwise random playouts along paths that tend to maximize the expected goal value, on the assumption (within the playout) that all achievable scores are equally likely. This can most simply be done by simply down-weighting moves that decrease the choosing role's max score, and up-weighting those that increase the min. In zero-sum games, up-weighting those that decrease the max opponent role scores, and down-weighting those that increase their min scores can also be applied. Currently we do this VERY crudely and handle the 'decrease weight' cases by preventing those moves being selected at all in the presence of any other choice!

Some examples help illustrate where these techniques give benefit, so I'll discuss a few in the following paragraphs.

The Untwisty Corridor Family

This family of games are basically just test cases for latch detection. They are puzzles involving traversing a maze (logically speaking anyway - none of them actually have visualization currently so far as I am aware). Any step onto the wrong path sets a 'failure' proposition in the game's state, and the game terminates after a fixed number of moves. Consequently you must make the 'correct' choice for each of a fixed number of steps (in 'untwisty corridor' you need to go straight every move, hence the name). Because a wrong choice sets a latched proposition, which ultimately determines the goal values, this is a simple latched-goal case, where any wrong move immediately fully determines the final score (as 0). Because all such states are immediately treated as if they were terminal, and not searched, the effect is that the MCTS search only ever visits nodes one step off the correct branch, which reduced the size of the search space from being exponential in number of steps to being linear.

In fact, the basic untwisty corridor is somewhat flawed, because all false choices lead to the SAME state (apart from the step counter), so it is trivially solved by transposition, provided the player maintains a transposition table of some sort. Furthermore, at only 6 steps, it is small enough to solve by brute force anyway! The game 'untwistycomplex2' is an attempt to address these issues, and a better test.

Escort Latch Breakthrough

The game Escort Latch Breakthrough is a variant of Breakthrough where each role has a king and 8 pawns. The goal is to get the King to the far end of the board. If both kings are captured the game is a draw. A typical MCTS player, without latch analysis, will usually move a pawn from in front of their king and then just advance the king right up to one rank before it can be captured by the opponent's pawns. At that point it's usually fairly trivial for the opponent to force capture it (kings cannot move backwards) and games between such players tend to end up as mutual king exchanges and therefore draws (for example, this game). The reason this happens is that the vulnerability of the king has very little impact on the ensemble of playout scores, because almost all playouts fail to bother capturing the kings, and MCTS convergence is slowed by the need to achieve convergence in subtrees following king capture.

With latch detection a king capture is seen as an achievable score range change (from [0,100] to [50,100] or [0,50] depending on which king is captured). This impacts the search in several ways:

In states where one king has been captured, capturing the other is terminal, and can be propagated as if it were a win (even though it scores only 50), because it is the best achievable result
Moves that capture/avoid capture of a king are heuristically selected more frequently, which speeds convergence
Playouts are much more reasonable, since they do not include games in which a king is NOT captured if it could be (i.e. - where a king just marches through a defending line of pawns without being molested). This dramatically improves the quality of Monte Carol samples generated by the playouts. This gain far outweighs the cost in reduced playout count that the evaluation requires (which is actually quite substantial since all moves must be examined at each stage)

The result is that rather than optimistically advancing their king into danger, the player tends to hold it back, and work to capture its opponent's king first (see this game for example)

Sancho GGP Player

Sunday, August 17, 2014

Goal Latches

No comments:

Post a Comment