Sancho GGP Player: Through a Murky Lense

I've been meaning to write this post for ages (it's largely based on enhancements made late last year), but never seemed to get around to it. Well, with the new run of the Coursera GGP course beginning last week, I really wanted to get back to making regular updates again, so now seems a good time to finally get to this one!

The general message of this post is that you don't necessarily need to play exactly the game you are given! Playing something that is an approximation of it will often suffice, provided the approximation is something you can play better than the exact game, and usually it gives the right results. I'll discuss some different types of approximation, some of the more adventurous of which I may well come back to in future posts, but the main thrust of the thesis is that a player that plays well in a game that usually has the same semantics as the actual game it is supposed to be playing, can often beat a completely 'correct' player that doesn't play as well within the domain of the approximation, provided that most of the time actual play stays in that domain.

There are probably many categories of approaches to approximation that can be taken, but I'll talk here about just two, one of which is implemented (for some cases) by the current version of Sancho, and the other of which I'm currently working on some restricted cases of.

State machine feature emulation

Often we are given game rules that manifest as a very expensive state machine, and/or a state machine that is hard to reason about in order to perform feature analysis. A typical example is something like Reversi, which has a horribly expensive goals network, which slows down tree expansion. Fairly straight-forward analysis can reveal logic (and base propositions) that have no impact on move legality, and basic logic structures which suggest possible forms of the goal calculation (e.g. - successor-like logic that suggests counting is going on), and these clues can suggest hypotheses for the goal forms. Based on these clues, one can come up with a set of hypothetical goal calculations, and then during meta-gaming measure their accuracy across random games. If you find a hypothetical goal calculation that appears to match what the game actually delivers via a full state-machine implementation, one can then decide to play with an emulation of the goals logic external to the animation of the state machine itself (and remove all goal logic from the underlying state-machine).

In Sancho we use this to detect goal structures of the forms:

Game is a straight-forward win/loss/draw based on who holds the majority of some set of base props
Goal value is cardinality of some set of base props whose value is true in the current state
Game is a straight-forward win/loss/draw based on the previous type but interpreting the values it produces as inputs to a who-has-more calculation (Reversi is of this kind - basically are there more white pieces than black pieces or visa versa)
As any of the above but where the count is represented via a set of successor counting props in the game state (Dots&Boxes looks like this as I recall)

If it can hypothesize a probable goal calculation, and match it to one of the above then Sancho will strip goal logic from the state-machine and run an emulator that directly evaluates the calculations. This has two benefits:

It typically results in significantly faster tree expansion, since next state and goal calculations involve less logic
Once we decide to believe in one of the above forms we can use it to draw further inferences about the game which can be useful. Examples are:
- If a we can determine that a count cannot go backwards (e.g. - in the successor logic there might be no path that decrements a count, or in a majority-of-propositions formulation where all of the propositions being counted are latches), and the game's final result is a win/loss based on a majority of a fixed size set, then the game's result is fully determined once one role has more than half. This allows us to add some emulation to the terminal logic also, and declare the game to be virtually terminal (for search purposes) in states where the outcome is fully determined. This allows Sancho to cease searching in Dots&Boxes when one player has 13 boxes for example.
- If a count is based on the cardinality of some set of base propositions, and some of those propositions are known to be latches, then we can infer a possible heuristic, that it is good to get those latched propositions set. In Sancho we then feed the resulting heuristic through the same statistical verification we use to determine generally which heuristics should be turned on (previously discussed when talking about Piece detection heuristics). This turns out to provide Sancho with a heuristic that corners are good to own in Reversi for example.

Approximate Factorization and Local Search

A second interesting area, is that of games that exhibit some sort of 'locality' of action. That is to say, games in which it is possible to define a measure of distance, such that base propositions can be mapped onto a metric space, where the distance measure reflects the number of steps before changes to one proposition can result in changes to another. This relates closely to factorization, in that in a factorizable game the distances will divide the space of base propositions (possibly after some processing to remove 'control logic' such as whose turn it is) into disconnected subsets.

The fully factorizable case I've discussed in a previous post, and leads to factorization of the game into sub-games, which are then independently searchable.

However, cases that are not fully factorizable might still be able to make use of the distance information. Two obvious uses suggest themselves:

Firstly, if a game exhibits localized neighbourhoods, that are strongly connected internally, but weakly connected externally, then we could consider making sub-games that consist only of those neighbourhoods (separately) and searching them as if they were true factors. As a thought-experiment consider a game like Risk. What armies are placed, and move, in one part of the world has little impact on distant parts of the world for many turns. Thus we could treat regions of the board separately (searching tactical possibilities in Asia without worrying about South America for instance). Provided we can weave the results of the sub-game searches back together in some useful way (typically goals might not exhibit the same locality that legality and base props do), it will be much cheaper to search the subgames than to globally search the entire gamespace. Note that subgames need not be disjoint, so we could consider overlapping regions and take the move from the one with the best local result for instance (discounted by the worst local penalty for not playing in the other subgames).

Secondly, we can search from a given state only within a fixed neighbourhood (say everything that could be be influenced within a certain number of steps). If we assume (and we can determine this analytically by looking at the terminal/goal conditions for the game) that two moves are only meaningful in the same move sequence if they can both influence something commonly within the restricted neighbourhood, then we can restrict the choice of moves we look at within the search. For example, imagine a Sheep&Wolf like game, but taking place in a maze, where one player with several pieces (like the sheep) is aiming to trap an opponent piece (like the wolf), and wins if they do. Search sequences (up to a given length) which feature moving sheep which are further away from one another than the neighbourhood diameter are pointless, as two such moves cannot both be relevant to a successful capture (within a number of moves equal to the neighbourhood diameter). Hence we can perform restricted search, such that choosing to move a certain sheep as the first move in the sequence constrains later choices, and greatly reduces the effective branching factor. The downside of this is that the search results are only valid up to a depth of the neighbourhood radius. We can use an iterative deepening approach to address this.

Currently Sancho does not implement approximate factorization, but an experimental version is under development that does implement local search. I will (hopefully) have a lot more to say about this in a future post!

Sancho GGP Player

Sunday, April 5, 2015

Through a Murky Lense

No comments:

Post a Comment