Sancho GGP Player: Puzzling it out

During the recent competition I promised to spill the beans on our Sudoku-enabling analysis, and other (mostly) puzzle-specific enhancements. Accordingly, this post will go through the more interesting techniques Sancho uses in puzzle solving (i.e. - 1-player games, or pseudo-1-player games, which I will define later).

Plans

The first observation that can be made universally about 1-player (deterministic, complete information) games, is that the player is entirely in charge of its own fate, and doesn't have to worry about what some other entity might do. This means that any solution path seen will always remain valid, if followed from that point onward. We therefore trap any sight of a solution state and commit the path we saw to it as a 'plan', after which, when asked for a move, we simply read the next move off the plan (hence the very fast response that Sancho exhibits in puzzles once it solves it, since no further searching is required).
Opportunities for caching plans arise both during tree node expansion (when a child turns out to be terminal and a solution), and during MonteCarlo playouts (from which we return the full path played in 1-player games).
For many of the typical puzzles encountered, a solution will be found during meta-gaming time, so as soon as the game manager starts asking for moves, we will be able to feed them back as fast as it can consume them. In the recent tournament we were able to find and cache plans during meta-gaming for each of the following:

We were unable to find plans for Hunter or HunterBig, though for entirely different reasons. In Hunter the maximum achievable score is actually 87, though this is not (readily) inferrable from the rules, so although we find a maximum-scoring solution fast, we cannot be sure it is maximal, and so do not cache it (and instead continue to search normally each move). In HunterBig, repeat visits to a previously visited square are allowed (not terminal), and this dramatically increases the size of the search space (and prevents greedy rollout cutoffs), so we simply fail to find an optimal solution at all.

Target States

In some games it is possible to identify the state you need to reach to achieve a solution by analysis of the propnet. Specifically, it is possible to generate the set of all such states by fixing the winning goal value proposition to true and flowing backward through the propnet feeding that goal proposition, until base proposition are reached on all paths (encountering an OR [or an AND when the output is required to be false] causes the search to branch, generally leading to N solutions for an N-input OR).

For most games this is impractical since either the size of the set of solution states is extremely large (there are many possible solution states, only some of which may be reachable from the initial state of course), or the size of the set of interim propnet states encountered during the backward search blows up.

However, for some games it allows a state, or small set of states to be identified fairly easily. 8-puzzle (and the family of sliding block puzzles generally) is one such, where there is exactly one solution configuration by construction.

Given a target state and a measure of distance between states, simple A* search can be used to identify the optimal path (strictly the distance measure must be what is known as an 'allowable heuristic', which basically means it never over-estimates the number of moves needed to get between the two states). Using A* with a non-admissible heuristic will still work, but may examine many more intermediate states during the search, and may not find the shortest possible path, if multiple paths exist.

In situations where our meta-gaming MonteCarlo simulation shows no differentiation in scores of playouts (everything scores 0, or at least the same), and we can identify a target state (and a couple of other minor conditions necessary for there to be a decent chance of having a half-decent distance-between-states metric), we attempt to solve using A* during meta-gaming. If we fail we fall back on regular search. If we succeed we cache the plan found.

The state distance heuristic we use is just Hamming distance of the base proposition set (i.e. - how many of the base propositions are different between the states). This is an admissible heuristic for 8-puzzle.

To reduce the search space further in such cases, we also strip out any base propositions which are independent of the moves made (i.e. the control logic, and in particular the step counter), which means that moves which return to a previous state (apart from the step counter - so previous configuration of the tiles in 8-puzzle) are not considered. Strictly speaking, this leaves us solving only an approximation of the 'real' game, but it would require a fairly perverse goal specification for this to be a bad approximation. Note that we already had analysis to identify the control logic, as something similar is done to take those aspects out of the propnet prior to factorization attempts.

With the above it typically takes us a little under a second to solve 8-puzzle.

Extending this to 15-puzzle is somewhat more of a challenge (and on my list of things to look into). In particular this will require:

Switching from A* to Iterated Deepening A* almost certainly (to keep memory usage within reasonable bounds)
Identification of a much better heuristic than Hamming distance. If anyone is interested in sliding block puzzle generally, a good summary of solution methods may be found here. Identifying Manhattan distance from propnet analysis (or statistical analysis of simplified games) should be pretty easy, but although it's considerably better than Hamming distance, Manhattan distance will not be good enough to solve 15-puzzle in reasonable time. I plan to look into attempts to create small pattern databases on the fly from solutions to sub-games, but that's currently on the future-experiments list.

As an aside, I plan to experiment with removal of the control logic (for state comparison purposes) in puzzles generally. I suspect it would make the difference to finding a perfect solution in hunterbig for example.

Goal Latches

A goal proposition, g is said to be positively latched by a base proposition, b, if g can never become false once b becomes true. Similarly a goal proposition, g', is said to be negatively latched by a base proposition, b', if g' can never become true once b' becomes true.

Analysis of the propnet can often reveal such latches. In some cases very simplistic analysis. If latched goals are identified, then any state in which a goal becomes positively latched can be considered terminal for search purposes (the goal value is fixed in all successor states, so no point in searching further). Similarly if the 'winning' goal becomes negatively latched (and you're only interested in 'wins').

Very simple analysis of this nature yields immediate solutions for simple games like a couple of those that featured (or tried to in one case) in the recent tournament:

untwisty coridoor (this has no visualization, but basically its a 'maze' where you seal your fate if you step off a single safe path)
untistycomplex2 - this is a much more subtle version of the above, which addresses some issues with the simple version (it is trivially solvable by transposition analysis, even if you know nothing about latches). Unfortunately the GDL they tried to use was broken. I have provided the Stanford guys with a corrected version (which Sancho duly solves), but as of this writing the Stanford repository has not been updated with it to my knowledge, so I cannot provide a link

Move choice partitioning

Sudoku presents an interesting problem. No target state is identifiable (finding one would amount to knowing a solution in fact!), the branching factor is huge, and no score signals are generated except by the (unique usually, but certainty very sparse) solution itself. For these reasons, any statistical search that doesn't somehow radically trim the search space is doomed from the outset.

Fortunately, the encoding of Sudoku makes the constraints implicit in the definition of the legals - that is it is only legal to play moves that do not immediately break one of the no-duplicates-in-a-row-or-column-or-square constraints. Having these constraints implicit in the encoding makes the problem a lot easier than it would otherwise be.

Several things become apparent from fairly simple analysis of the propnet:

The 100-score goal is only achievable if no cell is marked with an X in the terminal state (that is you've managed to apply legal numeric moves to very cell). That is to say some set X' of base props disable the solution, so in any solution state, S, S^X' = 0
All cell state base propositions apart from those that mark the cells with an X are themselves latches (that is to say, once a cell is numerically marked it can never become unmarked)
All X-markings are negative latches (once unset they cannot become set)
The moves upon which X markings of cells are directly dependent (i.e. - those moves that can be reached from a X marking proposition by backward chaining through the propnet, without crossing a transition in the propnet apart from the base prop concerned's own transition) form a partitioning of the set of all moves. the equivalence sets of this partitioning are exactly the moves which change a single cell. Because of the negative latch nature of the X markings and the requirement that no X marked cells remain in the solution state, at least one move must be played from each partition.

In any game which exhibits the above properties, in any given turn it is only necessary to consider choices from one of the partitions of the move space. The ordering is arbitrary, but search is reduced by choosing the most constrained (smallest intersection of partition and current legal moves) at any given tree node. In Sudoku this means we'll consider the possibilities for the cell with the fewest legal numeric options still available to mark it with (often a single choice) in each state encountered during the search. The result is actually a very small search tree!

We implement all of the above with what we call state machine filters, which act as indirection layers through which isTerminal(), getGoal() and getLegalMoves() calls pass (and some latch-related queries which are not directly relevant here). Such a filter can reduce the set of legal moves that the underlying state machine generates and provide a subset to the search algorithm running above it (the most constrained partition in the above analysis in the case of the move-partitioning filter). It can also modify terminality and goal values as seen by the search algorithm, which is necessary for other uses (which will be the subject of a future post) not relevant to the Sudoku solve.

In general, given game G with generated state machine S, the concept of state machine filters allows us to apply the results of analysis (typically during meta-gaming) to the state machine to transform it from S to S' for the purposes of search. Some filters preserve game semantics accurately (the move choice partitioning one does for example), so G' = G even though S' != S. However, filters which transform the game itself (and thus have G' != G) are also of interest. Such a filter would cause you to solve an approximation of the actual game - this may be a good enough heuristic in itself, or may be part of a mechanism to apply 'strategic' guidance to a full search in G. By architecting the filters as a separate layer we can isolate the analysis and its use without requiring either the underlying state machine implementation to change, or the overlying search algorithm to change (once they are enabled to cope with the insertion of filters generically in the first place)

Pseudo Puzzles

Some (contrived!) N-player games are just N 1-player games bolted together (Dual Hunter for example). In such cases it is not necessary to search the combined game tree which includes the moves of other players, since their board states are irrelevant to our goals.

Sancho detects this by a dependency analysis on the goal propositions of its role, and if it finds that other roles are irrelevant (that is to say, their moves cannot impact our score), it entirely replaces the sub-game in which that other role is involved, with a sub-game that has exactly one move, which has no impact on any state. It also flags the game as a pseudo-puzzle (if only its own role is left after replacing irrelevant ones), which enables plan-caching. This is safe, because the planned sequence cannot be impacted by anything other roles do, and hence can be replayed regardless of their moves.

The net effect is that Sancho searches a greatly reduced search space. Multiknightstour, which is 2 10X10 knight's tour puzzles bolted together in this way, was played on the second day of the 2014 championships, but sadly not in Sancho's bracket, as it would have benefited significantly from this analysis.

A related case occurs in Chinook, which is essentially two games of checkers, played on the odd and even squares simultaneously. In the Stanford version of Chinook (which is different to the base version, as found on Tiltyard) one player scores for captures they make on the even board, and the other for captures they make on the odd board. This actually makes the board your role does NOT score on totally irrelevant to your own goals. Since Sancho successfully factors Chinook into two independent games, it actually goes further in this case and doesn't bother searching one tree at all. Instead it just noops in that tree (which is rather unfortunate for its opponent given the way they score in Chinook!). I should note that I think Stanford Chinook is NOT a good game for tournaments in future unless the elective noop is removed. Even then I think that base.Chinook is a better factorized game test for competition purposes.

3 comments:

Andrew RoseAugust 1, 2014 at 3:57 PM
I note that base.firefighter is likely to be equivalent to untwistycomplex2 in terms of the player abilities that it exercises. In particular, it's much longer than untwistycorrider (which, at just 7 moves long, can be easily brute forced with no intelligence) and, like untwistycomplex2, can't be solved by transpositions alone - it needs latches to reduce the search space.
UnknownDecember 28, 2014 at 10:56 PM
Hi,
About using heuristic to solve a general games more efficiently. Can one heuristic ,like Hamilton, be truly that general or we need different heuristic for different games ? like heuristic for knight tour is different from the one used in 8puzzle.
Steve DraperDecember 30, 2014 at 5:19 AM
In general no. However, in puzzles where the goal state (or a goal state) can be derived, and you just need to, figure out how to get there, A* combined with a distance heuristic will usually work far better than an undirected search, so part of the approach is to try to analyse enough to categorize the puzzle into a class for which you know you can generate applicable heuristics (of which this is one such class). Same goes for most of the techniques in this post - each effectively defines a class of games (detectable at analysis time) for which they are (likely to be) applicable. Of course, many puzzles fall outside of any of the classes for which we currently have good (GGP) techniques.

Friday, August 1, 2014

Puzzling it out

3 comments: