Thursday, July 31, 2014

Sancho is GGP Champion 2014!




At approximately 1:30am this morning (BST), Sancho was crowned General Game Playing Champion 2014.  Sadly I wasn’t there to witness it because I was feeling unwell and had gone to bed.  However, I was pleased to wake up to the good news this morning.

The final twelve players competed in the double-elimination phase last night.  Sancho played against 3 of them, winning 2-1, 2-0, 3-1.

[The links below are to visualizations of the matches.  Several of them work best in Chrome.  Use the right/left arrows to see how the match progressed.]
  • Round 1 (best of 3) vs Galvanise written by Richard, a fellow Edinburgher who won the 2nd Coursera GGP course
  • Round 2 (best of 3) vs QFWFQ, a competitor of ours in the 1st Coursera GGP course, last autumn
    • Walkover in Hex
    • Win in Joint Connect 4 (two simultaneous games of connect 4, where you have to get a line of 4 on the top board or force your opponent to make a line of 4 on the bottom board) 
  • Grand Final (best of 5) vs LeJoueur, long-time GGP player written by Jean-Noel Vittaut - a researcher in machine learning in Paris University (I think). 
    • Win in Hex after the opponent blundered a very strong position.  (By my reckoning, he shouldn't have connected to the bottom at move 14 because there were still two options open.  By connecting early, he lost a move at the top and never recovered.)
    • Loss in Breakthrough – our first in many months.  I’m told this was a good game.  (Aim is to get a pawn through the other side.  Pawns can move straight or diagonally and can take diagonally only.)  I haven't analysed this yet.
    • Win in a contrived combination of tic-tac-toe, chess and connect 4, which we usually play badly at and were playing against the bias.  I haven't looked at this yet.
    • Win in Joint Connect 4.  This is a strong game for us because Sancho factors it into the two individual games (thereby massively reducing the branching factor).

We were disappointed not to get to play the reigning champion (TurboTurtle by Sam) who was eliminated in his first round, although we often get to play GreenShell (which we strongly suspect is the same as TurboTurtle - but Sam has declined several opportunities to confirm) on the Tiltyard.

With many thanks to Bertrand, Ben, Eric & Todor for organising this event and thanks to all the other competitors for a good game.

We plan to be back in January 2015 (the AAAI conference is moving forward 6 months) and Steve hopes to be there in person.  Polish up those players and come and show us how it should really be done!


More blog posts to follow in the coming week or so about some of what made Sancho the champion this time round.

Wednesday, July 30, 2014

Finals Day 1 round-up

[This article was really pitched at colleagues / family, so the GGP regulars will know most of the detail.]

Over 8 hours yesterday, our player, “Sancho”, played 18 matches and finished at the top of the table, more than a complete game clear of the next nearest opponent.  We lost 1 game (where we were playing second in a game against a significant 1st-player bias) and dropped a few points here and there on some of the other games.

Player
Games
Points
Average
Sancho
18
1612
90
General
18
1478
82
QFWFQ
18
1472
82
Galvanise
18
1429
79
----
LeJoueur
18
1412
78
TurboTurtle
18
1356
75
Gamer
18
1325
74
Alloy
18
1300
72
Dumalion
18
1272
71
MINIPlayer
18
1230
68
Knower
18
1038
58
LICAgent
18
1027
57
====
QuorumPlayer
18
1017
57
MonkSaki
18
964
54
Valor
18
790
44
Ary
18
740
41
AIRush
18
491
27


Here are the games we played (where you can see us in action, turn-by-turn, by pressing the left/right arrows).

·         Alquerque – a non-zero sum game.  Capturing an opponent’s piece gets you 10 points.  You play 10 moves each.  Trading pieces is good in this stage of the tournament because everybody is involved in a match of this.  Therefore a 90/100 loss is better than a 10/0 win.
·         Breakthrough – try to get a pawn to the other side of the board.  This ought to have been a real showcase game for us – but a technical glitch means our match wasn't saved, so the sample shown is some of our competitors playing.
·         Pentago is like noughts and crosses, with a twist (literally).  There are 4 boards in a grid.  On each turn you make your mark and then twist the board.  5 in a row to win.  We won this playing first and second (which is much harder).
·         9-board tic-tac-toe.  The position you play on your turn defines the board that your opponent must play on in the next turn.  Another game with a 1st-player bias, we won playing first, but lost playing second.
·         Hex – connect the edges of the board.  Your opponent is trying to do the same at right angles to you.  This was played whilst I was sleeping – looks like a complete walk-over.
·         Chinook is two games of draughts played simultaneously (on the white and black squares).  Points scored for taking opponent pieces – but only on the board that’s the same colour as your opponent’s pieces.  Not the best game, because you’re allowed to pass – meaning that you can force your opponent to score 0 (by never moving on the board he scores on) – which we duly did.  The visualization only works in Chrome.
·         Duidoku – a two-player game played on a Sudoku grid.  But you aren’t trying to complete the grid.  Instead, you’re trying to make it such that your opponent doesn’t have a legal play (i.e. can’t put any number in any cell without violating one of the Sudoku constraints).  We hadn’t seen this before and, by the look of the results, it has a strong 1st-player bias.  Since it was only played one way round, I suspect we got lucky here.
·         And a contrived tic-tac-toe based game that isn’t worth describing (or looking at).

And here are the puzzles (1-player games) that we solved.

·         A Hamiltonian cycle.
·         Hunter – which is a Knight’s Tour on a board of a size where a complete tour isn’t possible.
·         And the same on a much bigger board – where we didn’t manage an optimal solution, dropping 7 points.
·         A sliding tiles puzzle, which we solved in the smallest possible number of moves.
·         Sudoku, where we were one of just 2 players to solve it.
·         Also three contrived puzzles with no useful visualization.

This evening and into the early hours of tomorrow (UK time), the top 12 teams in the table above will compete in a double-elimination tournament to declare the 2014 champion.

Tuesday, July 29, 2014

Tired but happy with the first day

The first day of the 2014 competition consisted mostly of puzzles and non-fixed-sum 2 player games (with a couple of fixed sum ones thrown in to leaven the mix).

These were a mixture of factorizable and non-factorizable games, with a few latch-optimizable examples thrown in for good measure.

The day was hard-fought, and we started off slowly, with most of our worst games happening early on, notably a game of 9-board TicTacToe as second player (it has a fairly strong first player role bias), which was our only complete loss of the day (this and Pentago were both played with roles reversed to remove the bias).

Things then started playing to our strengths.  Most significantly a Sudoku to solve, which leveraged some significant latch and dependency analysis we perform (which will be the subject of a future post), and which only we and one other player were able to solve.  By the time we came to a match of Hex, our direct competitor on the leader-board was a player called 'General' (who's author has promised to post some descriptive information after the tournament, which I'll edit a link in to when I get it), who was also our opponent at Hex.  Fortunately for us, we were able to generate a much better sampling rate to feed our MonteCarlo tree search (about 70K simulated games per turn, against only a couple of thousand), and that enabled us to win reasonably easily.  I should note that 70K playouts per 20 second turn would ordinarily be considered a pretty poor sampling rate, but Hex is notorious for being hard to simulate quickly.

At the end of the day our we scored a total of 1612 points over 19 games (from a possible total of 1887 given the games concerned, as one puzzle had a max possible score of 87), putting us comfortably in the top spot.  This means that tomorrow we'll have the benefit of playing in the top bracket of the double-elimination phase, which means it takes two match losses to eliminate us (which is a significant advantage for the top 4 players, seeded to that bracket.


Monday, July 28, 2014

On the Eve of Battle

Well, here we are.  Tomorrow the 2014 championships start.  There's nothing more we can do now but see how we fare against the other teams, and hope we have done enough to acquit ourselves well.

Some promising experimental changes I was working on last week have been deemed too risky to include at this point, so we'll be playing a version that is broadly similar the version that we ran in qualifier 5, plus a few very minor tweaks.  The past 5 or so days have been spent doing nothing but testing our way through various games from the Base and Stanford repositories, and soaking on Tiltyard overnight each day.  Hopefully all this testing will pay dividends in stability throughout the tournament itself.

Good luck to all the other competitors.  I'll make short update here after each day's play.

Wednesday, July 2, 2014

Don't panic Mr Mannering!

If you get the title reference you're probably as old as I am (and also English).  Anyway, 20 hours to go until the first qualifiers for the 2014 World's and time for the sky to cave in!

First we've been erroring every game of 'Sheep & Wolf' on Tiltyard the past 24 hours or so, so I must have broken something (or Andrew did, but it was probably me - it usually is!).  It was bound to be an obvious bug right?  I mean it fails immediately during meta-gaming - how hard could that be to find?

Turns out quite hard.  Was down to a recent 'improvement' to our propnet factory, that was intended to aggressively trim input propositions (and any logic dangling from them) for always-illegal moves.  This was necessary (well convenient anyway) for some puzzle analysis that I've been working on the last couple of weeks (to address a class of games that boil down to marking cells, such as Sudoku).  I'll be writing a post on extra puzzle solving analysis we do now (or will be doing shortly, as some is not quite finished yet), that allows us to handle factorized puzzles (dual Hamilton, multiple whatever, etc.), sparse-goal-state puzzles (8-puzzle etc.), and puzzles with a partitionable decision space (Sudoku etc.) at some point (possibly not until after the Worlds though).  Anyway, for some reason I still do not understand (I will have to go back to this later), the changes to achieve this caused a few games (Sheep&Wolf notably) to fail by never terminating (i.e. - it broke their terminal network in some obscure fashion I have not managed to figure out yet).  For now I've just backed out the 'improvement' concerned!

Ok, so everything must be fine now surely?

Now to make sure that we pass the 'player tester' that they will be using to filter entry into tomorrow's qualifiers.  How hard can that be?

Hmmm

Harder than you might expect!  We fail BOTH test games.   Aaaaarghhhhhh!

I blame the games of course!  The first issue was that they are only giving 5 seconds meta-gaming time, and we do rather a lot of serial analysis during meta-gaming, and when allocating how much time we are prepared to spend on each piece, we assume that we should leave at least (you guessed it) 5 seconds free at the end of the meta-gaming time to allow for network latency, garbage collection, lightning strikes, nuclear attack, and so on.  Unfortunately allowing only the time remaining from a total of 5 seconds, to leave at least 5 seconds spare, doesn't give much analysis time at all, and as a result we totally skipped some analysis without which the actual attempt to play even a trivial game utterly fails!

Ho hum.  Modified the code to always give us 1 second regardless of whether that leaves a decent buffer or not.

Game 1 passed.

In game 2 we seem to have no legal moves.  Maybe we cannot cope with such complex GDL.  Hmm.  This GDL to be precise:

( role white )
( base ( count 1 ) )
( base ( count 2 ) )
( base ( count 3 ) )
( base ( l 1 ) )
( base ( l 2 ) )
( input white move1 )
( input white move2 )
( init ( count 1 ) )
( init ( l 1 ) )
( init ( l 2 ) )
( <= ( legal white move1 ) ( true ( l 1 ) ) )
( <= ( legal white move2 ) ( true ( l 2 ) ) )
( <= ( next ( l 1 ) ) ( does white move1 ) )
( <= ( next ( l 2 ) ) ( does white move2 ) )
( <= ( next ( count 2 ) ) ( true ( count 1 ) ) )
( <= ( next ( count 3 ) ) ( true ( count 2 ) ) )
( goal white 0 )
( <= terminal ( true ( count 3 ) ) )

Nope.  Not overly complex.  Turns out we are optimizing a bit too aggressively in propnet generation.  Basically we throw away stuff that is irrelevant to our goal outcome in puzzles, which in this case is...the entire game!  A little over-zealous maybe.

Skipped the optimization for games where the goal is a fixed value regardless of how it plays out.

Yay, second game passed.  Player tester thinks we're A-Ok!

Time to regression test all the puzzles we know about, starting with the Stanford repository.  After a slight glitch (this is when I actually discovered that backing out the OPNF 'improvement' had broken our partitioning analysis, and left us looking about as effective as Random in Sudoku), they all pass with maximum achievable score again.  Well almost.  Only 99 on 8-puzzle.  I'll sort that out at the weekend (just need to finish off some work on a different puzzle solving approach that is not yet finished, but I'm confident will at least get full marks on 8-puzzle reliably once done).