Tuesday, December 9, 2014

Kalaha

Last Christmas, my father-in-law gave me a copy of Mancala (also known as Kalah or Kalaha).  I've played it a few dozen times, but my 7 year old daughter still beats me regularly!

So I wondered about coding up the rules in GDL to see how Sancho plays it.  As it transpires, a quick look in the base repository revealed that somebody had already done this back in 2009.  (This game isn't on Tiltyard rotation because it doesn't contain explicit definitions of the base and input propositions.)

Enthusiastically, I fired up Sancho and got it playing.  Sadly, it only managed ~100 iterations/turn.  (We've made some significant general improvements since then, but Kalaha is still an order of magnitude slower even than a complicated game like Chess.)  At the time, I made some tweaks to the GDL which improved the speed approximately three-fold, but that still wasn't enough for a good game.

So, it has sat on the back burner for nearly a year.  Over the last couple of days, I've been digging into it in a bit more detail because I have some moderately revolutionary ideas for dramatic improvements.  (Hopefully more in a further blog post.)

For now, I've produced an annotated version of the GDL which you may like to peruse.  (You'll definitely want to understand the basic rules of the game first though - see the link above.)

Sunday, September 28, 2014

Caught in the tar pit

I have been a bit remiss in making regular posts recently, and as some may have noticed, Sancho has also been absent from Tiltyard for nearly a month.

The reason for this is that about 4 weeks ago I undertook a reworking of the MCTS tree's representation internally to Sancho, to reduce memory usage, and remove some role-asymmetry that didn't seem right.  The catalyst for doing this was to set the groundwork for an enhanced move-sequence search which will be the subject of a future post (at least if it works!).  I expected it to take 3 or 4 days, but it soon turned out to be a sticky morass, from which escape was more difficult than envisaged!

The core of the change was to eliminate non-decision nodes in the tree (forced moves, forced noops, etc.), leaving only the nodes at which actual decisions have to be made.  Since Sancho's tree considers choices for a single role at any one node (not joint moves), this meant that in most games (non-simultaneous move games) decision layers were interposed with non-decision (forced noop) layers (one in N layers at most would have a decision, where there are N roles in the game).  This was both wasteful of space (tree node allocation), and introduced role asymmetry because the state only changes every Nth layer of the tree (where a full joint move is implied by the choices in the preceding N layers).

In the new structure all non-decision nodes are eliminated, and (as a side-effect) an edge always leads between nodes of different game states.  The semantics of the representation however, should be equivalent.

Stabilizing this change took roughly the expected time, but unexpectedly it turned out to have a distinctly damaging impact on the quality of play in games that employed heuristics (primarily games with a piece heuristic active, such as breakthrough, or chess).  The reason for this is not totally clear, but almost certainly has to do with the role asymmetry, which the original heuristic implementation was 'comfortable' with.

Since then I have been fiddling with the mechanisms by which heuristic values are blended into the MCTS tree, to find a modification that work well in the new structure.  Over the past 3 weeks or so I have found various approaches that initially worked well in my tests with one game, only to find that they performed badly in another.  The three games I have mostly been using to test (as they display somewhat different characteristics) were:

  • Breakthrough (simple equal-value piece heuristic, fixed sum)
  • Speed chess (variable value piece heuristic, fixed sum)
  • Skirmish (variable value piece heuristic, non-fixed sum)

In particular I found that what worked well in Breakthrough (where material exchanges are typically somewhat binary in nature) didn't work well in the other two (and visa vera).

However, as of yesterday I am now getting positive test results (but much more testing remains to be done) in all of the above games, so hopefully this time the light at the end of the tunnel is not another oncoming train!

A ton of regression testing now needs to be done, for which I plan to use the following games:
  • Reversi (uses a heuristic other than piece-based)
  • Connect4 (goto non-heuristic fixed-sum 2-player game)
  • Max knights (puzzle)
  • Sudoku (very different puzzle)
  • Three player free for all (more than 2 players)
  • Blocker (simultaneous turn)
  • Pentago (non alternating role choice sequence)
  • Breakthrough with walls (heuristic and factored)

This will take a few more days, even if no problems are found.  After that (I hope), normal service will be resumed...

Wednesday, August 20, 2014

Balanced diet

(Who ate all the pies?)

In the International General Game Playing Championship 2014, it struck me that many of the games that were played had significant bias for one player (usually, but not always, the first).  In an attempt to compensate, many games were played both ways round, often with the predictable outcome.  This was unfortunate because it increased the amount of time required to play games, reduced the excitement (by having predictable outcomes) and effectively reduced the best-of-3 rounds to a best-of-one (two matches of a game with a predictable outcome and then the decider - usually a game where the bias is unknown because it isn't played on Tiltyard and therefore statistics aren't available).

Whilst reading up on various abstract strategy games, I was introduced to the concept of the Pie rule.  Based on the traditional method for children to divide a cake ("you cut, I'll choose"), it aims to balance two player games by giving the second player the opportunity to swap places with the first player after the first move.  This way, the first player has an incentive to play the most balanced initial play the he possibly can.  (If he makes an opening move that is too strong, the second player will swap with him and get the strong position.  If he plays an opening move that is too weak, the second player will leave him with it.)

In the last few days, I have created Pie-rule variants of 9-board Tic-Tac-Toe and Hex.  (I've also created a visualization for regular Hex so that it can be played on Tiltyard.)  It's early days yet, but I notice that the pie rule seems to be doing the trick for 9BTTT.  In the first 16 matches played on the Tiltyard, 9 went with the first player and 7 with the second.  Whilst it's still a small sample size, that's looking substantially more balanced than the regular version.

So, for the GDL authors out there (by which I suppose I mean Alex!), consider the Pie rule for creating balance in your game.

For everybody else, what games would you like to see a Pie rule variant of?  Whilst I'm certainly not making any promises, if you post in the comments, I'll consider doing them.

Sunday, August 17, 2014

Goal Latches

Latches are properties of a game's state which, once they become true, are guaranteed to remain true throughout the remainder of the game (positively latched); or which, once they become false, are guaranteed to remain false (negatively latched).  A 'property' in this sense can be any logical function of the game's base state - most typically we talk about the simplest case of this, which is the state of an individual base proposition.

Identifying latches can be useful in several ways:

  • It may allow reduced computational effort in calculating state transitions (e.g. - if a part of the propnet is effectively 'dark' due to a latch condition, then once that latch is detected that part of the propnet can be left uncalculated in future state transitions).
  • Similarly it may allow reduced computational effort in heuristic calculation
  • If the latch constrains the achievable goals then it can be used to optimize search

This post is about the use Sancho makes of the last case - i.e. - situations in which it detects that the achievable score range has been reduced by latching of one or more of the goal propositions (at most one for each role can become positively latched, in which case the score for that role is then fully determined in all possible paths, but several may become independently negatively latched, which serves to reduce the size of the set of achievable scores).

Latch detection might be analytic or heuristic in nature.  I'll come back to heuristic latch detection in a future post - for this one I'm only considering analytic detection, though the use to which the result is put can be the same in either case.

Analytic latch detection is based on logical reasoning about the structure of the game.  In Sancho's case this is done by analysis of the propnet, and currently we only identify fairly simple cases in which a base proposition feeds back to itself via logic of one of the following forms (where X is the base prop, and X' is its next value on state transition):

X' = X OR <other condition>  (X is positively latched)

or

X' = ~X AND <other condition> (X is negatively latched)

Furthermore, if a goal proposition is of the form

G = X AND <other condition>, and X is negatively latched, then G is said to be negatively latched by X.

Similarly for the positive latch case:

G = X OR <other condition>

If we can identify such goal latches, we know that the set of achievable scores in any state of the game reachable from a state with latched goal propositions is either fully determined (positively latched goal), or is strictly reduced from the set of all possible goal values to some subset.

In Sancho this surfaces as a method on the state machine which returns the achievable goal range for any given state, as [min,max].  This range can be used in the MCTS search tree processing in several ways:
  • If min == max, then all possible paths result in the same score.  If this occurs for our role then whatever we play from that point on can have no impact on the eventual outcome.  Consequently we can treat any tree node in which this occurs as if it were terminal, and not search below it (the one exception being that if this occurs on the root node we expand one level just so that we have legal moves to choose from when asked to return a move)
  • If a terminal state is encountered with a score of the max achievable, we can invoke tree trimming by propagating upwards when the role deciding on the move at this branch is the one with the max-achievable score (i.e. - this amounts to a 'win' relative to the parent state and no further searching is needed since it cannot be improved upon)
  • If a move reduces the achievable range (either at the top or at the bottom) then we can trivially define a heuristic to favor search on paths that increase the min achievable score, and dis-favor those that decrease the max achievable.
Because latch detection is typically very cheap (at search time, once the analysis has been done during meta-gaming) we can also use the change in achievable score as a heuristic during playouts to guide otherwise random playouts along paths that tend to maximize the expected goal value, on the assumption (within the playout) that all achievable scores are equally likely.  This can most simply be done by simply down-weighting moves that decrease the choosing role's max score, and up-weighting those that increase the min.  In zero-sum games, up-weighting those that decrease the max opponent role scores, and down-weighting those that increase their min scores can also be applied.  Currently we do this VERY crudely and handle the 'decrease weight' cases by preventing those moves being selected at all in the presence of any other choice!

Some examples help illustrate where these techniques give benefit, so I'll discuss a few in the following paragraphs.

The Untwisty Corridor Family

This family of games are basically just test cases for latch detection.  They are puzzles involving traversing a maze (logically speaking anyway - none of them actually have visualization currently so far as I am aware).  Any step onto the wrong path sets a 'failure' proposition in the game's state, and the game terminates after a fixed number of moves.  Consequently you must make the 'correct' choice for each of a fixed number of steps (in 'untwisty corridor' you need to go straight every move, hence the name).  Because a wrong choice sets a latched proposition, which ultimately determines the goal values, this is a simple latched-goal case, where any wrong move immediately fully determines the final score (as 0).  Because all such states are immediately treated as if they were terminal, and not searched, the effect is that the MCTS search only ever visits nodes one step off the correct branch, which reduced the size of the search space from being exponential in number of steps to being linear.
In fact, the basic untwisty corridor is somewhat flawed, because all false choices lead to the SAME state (apart from the step counter), so it is trivially solved by transposition, provided the player maintains a transposition table of some sort.  Furthermore, at only 6 steps, it is small enough to solve by brute force anyway!  The game 'untwistycomplex2' is an attempt to address these issues, and a better test.

Escort Latch Breakthrough

The game Escort Latch Breakthrough is a variant of Breakthrough where each role has a king and 8 pawns.  The goal is to get the King to the far end of the board.  If both kings are captured the game is a draw.  A typical MCTS player, without latch analysis, will usually move a pawn from in front of their king and then just advance the king right up to one rank before it can be captured by the opponent's pawns.  At that point it's usually fairly trivial for the opponent to force capture it (kings cannot move backwards) and games between such players tend to end up as mutual king exchanges and therefore draws (for example, this game).  The reason this happens is that the vulnerability of the king has very little impact on the ensemble of playout scores, because almost all playouts fail to bother capturing the kings, and MCTS convergence is slowed by the need to achieve convergence in subtrees following king capture.
With latch detection a king capture is seen as an achievable score range change (from [0,100] to [50,100] or [0,50] depending on which king is captured).  This impacts the search in several ways:
  • In states where one king has been captured, capturing the other is terminal, and can be propagated as if it were a win (even though it scores only 50), because it is the best achievable result
  • Moves that capture/avoid capture of a king are heuristically selected more frequently, which speeds convergence
  • Playouts are much more reasonable, since they do not include games in which a king is NOT captured if it could be (i.e. - where a king just marches through a defending line of pawns without being molested).  This dramatically improves the quality of Monte Carol samples generated by the playouts.  This gain far outweighs the cost in reduced playout count that the evaluation requires (which is actually quite substantial since all moves must be examined at each stage)
The result is that rather than optimistically advancing their king into danger, the player tends to hold it back, and work to capture its opponent's king first (see this game for example)

Friday, August 1, 2014

Puzzling it out

During the recent competition I promised to spill the beans on our Sudoku-enabling analysis, and other (mostly) puzzle-specific enhancements.  Accordingly, this post will go through the more interesting techniques Sancho uses in puzzle solving (i.e. - 1-player games, or pseudo-1-player games, which I will define later).

Plans

The first observation that can be made universally about 1-player (deterministic, complete information) games, is that the player is entirely in charge of its own fate, and doesn't have to worry about what some other entity might do.  This means that any solution path seen will always remain valid, if followed from that point onward.  We therefore trap any sight of a solution state and commit the path we saw to it as a 'plan', after which, when asked for a move, we simply read the next move off the plan (hence the very fast response that Sancho exhibits in puzzles once it solves it, since no further searching is required).
Opportunities for caching plans arise both during tree node expansion (when a child turns out to be terminal and a solution), and during MonteCarlo playouts (from which we return the full path played in 1-player games).
For many of the typical puzzles encountered, a solution will be found during meta-gaming time, so as soon as the game manager starts asking for moves, we will be able to feed them back as fast as it can consume them.  In the recent tournament we were able to find and cache plans during meta-gaming for each of the following:

We were unable to find plans for Hunter or HunterBig, though for entirely different reasons.  In Hunter the maximum achievable score is actually 87, though this is not (readily) inferrable from the rules, so although we find a maximum-scoring solution fast, we cannot be sure it is maximal, and so do not cache it (and instead continue to search normally each move).  In HunterBig, repeat visits to a previously visited square are allowed (not terminal), and this dramatically increases the size of the search space (and prevents greedy rollout cutoffs), so we simply fail to find an optimal solution at all.

Target States

In some games it is possible to identify the state you need to reach to achieve a solution by analysis of the propnet.  Specifically, it is possible to generate the set of all such states by fixing the winning goal value proposition to true and flowing backward through the propnet feeding that goal proposition, until base proposition are reached on all paths (encountering an OR [or an AND when the output is required to be false] causes the search to branch, generally leading to N solutions for an N-input OR).
For most games this is impractical since either the size of the set of solution states is extremely large (there are many possible solution states, only some of which may be reachable from the initial state of course), or the size of the set of interim propnet states encountered during the backward search blows up.
However, for some games it allows a state, or small set of states to be identified fairly easily.  8-puzzle (and the family of sliding block puzzles generally) is one such, where there is exactly one solution configuration by construction.
Given a target state and a measure of distance between states, simple A* search can be used to identify the optimal path (strictly the distance measure must be what is known as an 'allowable heuristic', which basically means it never over-estimates the number of moves needed to get between the two states).  Using A* with a non-admissible heuristic will still work, but may examine many more intermediate states during the search, and may not find the shortest possible path, if multiple paths exist.
In situations where our meta-gaming MonteCarlo simulation shows no differentiation in scores of playouts (everything scores 0, or at least the same), and we can identify a target state (and a couple of other minor conditions necessary for there to be a decent chance of having a half-decent distance-between-states metric), we attempt to solve using A* during meta-gaming.  If we fail we fall back on regular search.  If we succeed we cache the plan found.
The state distance heuristic we use is just Hamming distance of the base proposition set (i.e. - how many of the base propositions are different between the states).  This is an admissible heuristic for 8-puzzle.
To reduce the search space further in such cases, we also strip out any base propositions which are independent of the moves made (i.e. the control logic, and in particular the step counter), which means that moves which return to a previous state (apart from the step counter - so previous configuration of the tiles in 8-puzzle) are not considered.  Strictly speaking, this leaves us solving only an approximation of the 'real' game, but it would require  a fairly perverse goal specification for this to be a bad approximation. Note that we already had analysis to identify the control logic, as something similar is done to take those aspects out of the propnet prior to factorization attempts.
With the above it typically takes us a little under a second to solve 8-puzzle.
Extending this to 15-puzzle is somewhat more of a challenge (and on my list of things to look into).  In particular this will require:
  • Switching from A* to Iterated Deepening A* almost certainly (to keep memory usage within reasonable bounds)
  • Identification of a much better heuristic than Hamming distance.  If anyone is interested in sliding block puzzle generally, a good summary of solution methods may be found here.  Identifying Manhattan distance from propnet analysis (or statistical analysis of simplified games) should be pretty easy, but although it's considerably better than Hamming distance, Manhattan distance will not be good enough to solve 15-puzzle in reasonable time.  I plan to look into attempts to create small pattern databases on the fly from solutions to sub-games, but that's currently on the future-experiments list.
As an aside, I plan to experiment with removal of the control logic (for state comparison purposes) in puzzles generally.  I suspect it would make the difference to finding a perfect solution in hunterbig for example.

Goal Latches

A goal proposition, g is said to be positively latched by a base proposition, b, if g can never become false once b becomes true.  Similarly a goal proposition, g', is said to be negatively latched by a base proposition, b', if g' can never become true once b' becomes true.
Analysis of the propnet can often reveal such latches.  In some cases very simplistic analysis.  If latched goals are identified, then any state in which a goal becomes positively latched can be considered terminal for search purposes (the goal value is fixed in all successor states, so no point in searching further).  Similarly if the 'winning' goal becomes negatively latched (and you're only interested in 'wins').
Very simple analysis of this nature yields immediate solutions for simple games like a couple of those that featured (or tried to in one case) in the recent tournament:
  • untwisty coridoor (this has no visualization, but basically its a 'maze' where you seal your fate if you step off a single safe path)
  • untistycomplex2  - this is a much more subtle version of the above, which addresses some issues with the simple version (it is trivially solvable by transposition analysis, even if you know nothing about latches).  Unfortunately the GDL they tried to use was broken.  I have provided the Stanford guys with a corrected version (which Sancho duly solves), but as of this writing the Stanford repository has not been updated with it to my knowledge, so I cannot provide a link

Move choice partitioning

Sudoku presents an interesting problem.  No target state is identifiable (finding one would amount to knowing a solution in fact!), the branching factor is huge, and no score signals are generated except by the (unique usually, but certainty very sparse) solution itself.  For these reasons, any statistical search that doesn't somehow radically trim the search space is doomed from the outset.
Fortunately, the encoding of Sudoku makes the constraints implicit in the definition of the legals - that is it is only legal to play moves that do not immediately break one of the no-duplicates-in-a-row-or-column-or-square constraints.  Having these constraints implicit in the encoding makes the problem a lot easier than it would otherwise be.
Several things become apparent from fairly simple analysis of the propnet:
  1. The 100-score goal is only achievable if no cell is marked with an X in the terminal state (that is you've managed to apply legal numeric moves to very cell).  That is to say some set X' of base props disable the solution, so in any solution state, S, S^X' = 0
  2. All cell state base propositions apart from those that mark the cells with an X are themselves latches (that is to say, once a cell is numerically marked it can never become unmarked)
  3. All X-markings are negative latches (once unset they cannot become set)
  4. The moves upon which X markings of cells are directly dependent (i.e. - those moves that can be reached from a X marking proposition by backward chaining through the propnet, without crossing a transition in the propnet apart from the base prop concerned's own transition) form a partitioning of the set of all moves.  the equivalence sets of this partitioning are exactly the moves which change a single cell.  Because of the negative latch nature of the X markings and the requirement that no X marked cells remain in the solution state, at least one move must be played from each partition.
In any game which exhibits the above properties, in any given turn it is only necessary to consider choices from one of the partitions of the move space.  The ordering is arbitrary, but search is reduced by choosing the most constrained (smallest intersection of partition and current legal moves) at any given tree node.  In Sudoku this means we'll consider the possibilities for the cell with the fewest legal numeric options still available to mark it with (often a single choice) in each state encountered during the search.  The result is actually a very small search tree!
We implement all of the above with what we call state machine filters, which act as indirection layers through which isTerminal(), getGoal() and getLegalMoves() calls pass (and some latch-related queries which are not directly relevant here).  Such a filter can reduce the set of legal moves that the underlying state machine generates and provide a subset to the search algorithm running above it (the most constrained partition in the above analysis in the case of the move-partitioning filter).  It can also modify terminality and goal values as seen by the search algorithm, which is necessary for other uses (which will be the subject of a future post) not relevant to the Sudoku solve.
In general, given game G with generated state machine S, the concept of state machine filters allows us to apply the results of analysis (typically during meta-gaming) to the state machine to transform it from S to S' for the purposes of search.  Some filters preserve game semantics accurately (the move choice partitioning one does for example), so G' = G even though S' != S.  However, filters which transform the game itself (and thus have G' != G) are also of interest.  Such a filter would cause you to solve an approximation of the actual game - this may be a good enough heuristic in itself, or may be part of a mechanism to apply 'strategic' guidance to a full search in G.  By architecting the filters as a separate layer we can isolate the analysis and its use without requiring either the underlying state machine implementation to change, or the overlying search algorithm to change (once they are enabled to cope with the insertion of filters generically in the first place)

Pseudo Puzzles

Some (contrived!) N-player games are just N 1-player games bolted together (Dual Hunter for example).  In such cases it is not necessary to search the combined game tree which includes the moves of other players, since their board states are irrelevant to our goals.
Sancho detects this by a dependency analysis on the goal propositions of its role, and if it finds that other roles are irrelevant (that is to say, their moves cannot impact our score), it entirely replaces the sub-game in which that other role is involved, with a sub-game that has exactly one move, which has no impact on any state.  It also flags the game as a pseudo-puzzle (if only its own role is left after replacing irrelevant ones), which enables plan-caching.  This is safe, because the planned sequence cannot be impacted by anything other roles do, and hence can be replayed regardless of their moves.
The net effect is that Sancho searches a greatly reduced search space. Multiknightstour, which is 2 10X10 knight's tour puzzles bolted together in this way, was played on the second day of the 2014 championships, but sadly not in Sancho's bracket, as it would have benefited significantly from this analysis.
A related case occurs in Chinook, which is essentially two games of checkers, played on the odd and even squares simultaneously.  In the Stanford version of Chinook (which is different to the base version, as found on Tiltyard) one player scores for captures they make on the even board, and the other for captures they make on the odd board.  This actually makes the board your role does NOT score on totally irrelevant to your own goals.  Since Sancho successfully factors Chinook into two independent games, it actually goes further in this case and doesn't bother searching one tree at all.  Instead it just noops in that tree (which is rather unfortunate for its opponent given the way they score in Chinook!).  I should note that I think Stanford Chinook is NOT a good game for tournaments in future unless the elective noop is removed.  Even then I think that base.Chinook is a better factorized game test for competition purposes.

Thursday, July 31, 2014

Sancho is GGP Champion 2014!




At approximately 1:30am this morning (BST), Sancho was crowned General Game Playing Champion 2014.  Sadly I wasn’t there to witness it because I was feeling unwell and had gone to bed.  However, I was pleased to wake up to the good news this morning.

The final twelve players competed in the double-elimination phase last night.  Sancho played against 3 of them, winning 2-1, 2-0, 3-1.

[The links below are to visualizations of the matches.  Several of them work best in Chrome.  Use the right/left arrows to see how the match progressed.]
  • Round 1 (best of 3) vs Galvanise written by Richard, a fellow Edinburgher who won the 2nd Coursera GGP course
  • Round 2 (best of 3) vs QFWFQ, a competitor of ours in the 1st Coursera GGP course, last autumn
    • Walkover in Hex
    • Win in Joint Connect 4 (two simultaneous games of connect 4, where you have to get a line of 4 on the top board or force your opponent to make a line of 4 on the bottom board) 
  • Grand Final (best of 5) vs LeJoueur, long-time GGP player written by Jean-Noel Vittaut - a researcher in machine learning in Paris University (I think). 
    • Win in Hex after the opponent blundered a very strong position.  (By my reckoning, he shouldn't have connected to the bottom at move 14 because there were still two options open.  By connecting early, he lost a move at the top and never recovered.)
    • Loss in Breakthrough – our first in many months.  I’m told this was a good game.  (Aim is to get a pawn through the other side.  Pawns can move straight or diagonally and can take diagonally only.)  I haven't analysed this yet.
    • Win in a contrived combination of tic-tac-toe, chess and connect 4, which we usually play badly at and were playing against the bias.  I haven't looked at this yet.
    • Win in Joint Connect 4.  This is a strong game for us because Sancho factors it into the two individual games (thereby massively reducing the branching factor).

We were disappointed not to get to play the reigning champion (TurboTurtle by Sam) who was eliminated in his first round, although we often get to play GreenShell (which we strongly suspect is the same as TurboTurtle - but Sam has declined several opportunities to confirm) on the Tiltyard.

With many thanks to Bertrand, Ben, Eric & Todor for organising this event and thanks to all the other competitors for a good game.

We plan to be back in January 2015 (the AAAI conference is moving forward 6 months) and Steve hopes to be there in person.  Polish up those players and come and show us how it should really be done!


More blog posts to follow in the coming week or so about some of what made Sancho the champion this time round.

Wednesday, July 30, 2014

Finals Day 1 round-up

[This article was really pitched at colleagues / family, so the GGP regulars will know most of the detail.]

Over 8 hours yesterday, our player, “Sancho”, played 18 matches and finished at the top of the table, more than a complete game clear of the next nearest opponent.  We lost 1 game (where we were playing second in a game against a significant 1st-player bias) and dropped a few points here and there on some of the other games.

Player
Games
Points
Average
Sancho
18
1612
90
General
18
1478
82
QFWFQ
18
1472
82
Galvanise
18
1429
79
----
LeJoueur
18
1412
78
TurboTurtle
18
1356
75
Gamer
18
1325
74
Alloy
18
1300
72
Dumalion
18
1272
71
MINIPlayer
18
1230
68
Knower
18
1038
58
LICAgent
18
1027
57
====
QuorumPlayer
18
1017
57
MonkSaki
18
964
54
Valor
18
790
44
Ary
18
740
41
AIRush
18
491
27


Here are the games we played (where you can see us in action, turn-by-turn, by pressing the left/right arrows).

·         Alquerque – a non-zero sum game.  Capturing an opponent’s piece gets you 10 points.  You play 10 moves each.  Trading pieces is good in this stage of the tournament because everybody is involved in a match of this.  Therefore a 90/100 loss is better than a 10/0 win.
·         Breakthrough – try to get a pawn to the other side of the board.  This ought to have been a real showcase game for us – but a technical glitch means our match wasn't saved, so the sample shown is some of our competitors playing.
·         Pentago is like noughts and crosses, with a twist (literally).  There are 4 boards in a grid.  On each turn you make your mark and then twist the board.  5 in a row to win.  We won this playing first and second (which is much harder).
·         9-board tic-tac-toe.  The position you play on your turn defines the board that your opponent must play on in the next turn.  Another game with a 1st-player bias, we won playing first, but lost playing second.
·         Hex – connect the edges of the board.  Your opponent is trying to do the same at right angles to you.  This was played whilst I was sleeping – looks like a complete walk-over.
·         Chinook is two games of draughts played simultaneously (on the white and black squares).  Points scored for taking opponent pieces – but only on the board that’s the same colour as your opponent’s pieces.  Not the best game, because you’re allowed to pass – meaning that you can force your opponent to score 0 (by never moving on the board he scores on) – which we duly did.  The visualization only works in Chrome.
·         Duidoku – a two-player game played on a Sudoku grid.  But you aren’t trying to complete the grid.  Instead, you’re trying to make it such that your opponent doesn’t have a legal play (i.e. can’t put any number in any cell without violating one of the Sudoku constraints).  We hadn’t seen this before and, by the look of the results, it has a strong 1st-player bias.  Since it was only played one way round, I suspect we got lucky here.
·         And a contrived tic-tac-toe based game that isn’t worth describing (or looking at).

And here are the puzzles (1-player games) that we solved.

·         A Hamiltonian cycle.
·         Hunter – which is a Knight’s Tour on a board of a size where a complete tour isn’t possible.
·         And the same on a much bigger board – where we didn’t manage an optimal solution, dropping 7 points.
·         A sliding tiles puzzle, which we solved in the smallest possible number of moves.
·         Sudoku, where we were one of just 2 players to solve it.
·         Also three contrived puzzles with no useful visualization.

This evening and into the early hours of tomorrow (UK time), the top 12 teams in the table above will compete in a double-elimination tournament to declare the 2014 champion.

Tuesday, July 29, 2014

Tired but happy with the first day

The first day of the 2014 competition consisted mostly of puzzles and non-fixed-sum 2 player games (with a couple of fixed sum ones thrown in to leaven the mix).

These were a mixture of factorizable and non-factorizable games, with a few latch-optimizable examples thrown in for good measure.

The day was hard-fought, and we started off slowly, with most of our worst games happening early on, notably a game of 9-board TicTacToe as second player (it has a fairly strong first player role bias), which was our only complete loss of the day (this and Pentago were both played with roles reversed to remove the bias).

Things then started playing to our strengths.  Most significantly a Sudoku to solve, which leveraged some significant latch and dependency analysis we perform (which will be the subject of a future post), and which only we and one other player were able to solve.  By the time we came to a match of Hex, our direct competitor on the leader-board was a player called 'General' (who's author has promised to post some descriptive information after the tournament, which I'll edit a link in to when I get it), who was also our opponent at Hex.  Fortunately for us, we were able to generate a much better sampling rate to feed our MonteCarlo tree search (about 70K simulated games per turn, against only a couple of thousand), and that enabled us to win reasonably easily.  I should note that 70K playouts per 20 second turn would ordinarily be considered a pretty poor sampling rate, but Hex is notorious for being hard to simulate quickly.

At the end of the day our we scored a total of 1612 points over 19 games (from a possible total of 1887 given the games concerned, as one puzzle had a max possible score of 87), putting us comfortably in the top spot.  This means that tomorrow we'll have the benefit of playing in the top bracket of the double-elimination phase, which means it takes two match losses to eliminate us (which is a significant advantage for the top 4 players, seeded to that bracket.