One year since the bombshell announcement that DeepMind’s
AlphaZero needed only the rules of chess and four hours of self-play to be
able to beat Stockfish in a match, the long-awaited full paper has now been
published in the academic journal Science. We have new games – Matthew Sadler
has produced videos about five of them – and what seems conclusive evidence of
AlphaZero’s superiority. It won a new match 574.5:425.5, despite Stockfish
running in a powerful configuration and managing its own time. AlphaZero also
won when given just 1/10th of the time to think.

A generic game-beater

AlphaZero would be extraordinary even if it had only reached
“human” levels of attainment. It began as AlphaGo, that learned from human
games to become the world’s best Go player, then developed into AlphaGoZero, that
managed to surpass AlphaGo merely by playing against itself with no human
input. AlphaZero is the new generalised version of that “reinforcement and
search algorithm”, that the DeepMind team have shown can master multiple games –
chess, shogi and Go – knowing only the rules. In the case of chess AlphaGo
needed 300,000 of the 700,000 “steps” it took while training – just 4 hours
(of 9 in total) – to reach a level at
which it was beating Stockfish.

During the World Championship match we were featuring
content from 2-time British Champion Matthew Sadler and WIM Natasha Regan
, who
are co-authoring Game Changer. They appear in this
short video looking at AlphaZero:

Today the full paper on AlphaZero was published in Science,
and you can check it out here (as well as lots more on the DeepMind website).

An end to the Stockfish controversy?

Ever since the first announcement last year there have been
computer chess enthusiasts who, while not doubting the scientific achievement,
were concerned that Stockfish had been unfairly treated. The original match
that was announced as a 64:36 win for AlphaZero (28 wins to 0) was criticised
for crippling Stockfish with too little hash memory, an unusual number of
cores, and a 1-minute per move thinking time that didn’t allow Stockfish to
manage its own time. The implication was that in fair conditions Stockfish
might still have won.

In the full paper, however, the matches are much more
rigorous, with the main match played from the starting position over 1,000
games, with 3 hours per player plus a 15-second increment per move. The
Stockfish 8 configuration was the same as one used in the TCEC World Championship,
but AlphaZero scored 155 wins to only 6 for Stockfish. That’s not all: the
paper explains that AlphaZero still won when Stockfish was given an opening
book, or when the latest version of the program at the time of submission (Stockfish 9) was used. And the clincher:
AlphaZero also won when it was given just 1/10th of the time to think (at
1/30th of the time, Stockfish finally came out on top). 

The authors point out
that since AlphaZero normally searches 1,000 times fewer positions per second (60,000
to 60 million), that means that it reached better decisions while searching
10,000 times as few positions. One of the curiosities of the paper is that the
authors aren’t sure exactly why!

AlphaZero may compensate for the lower number of evaluations
by using its deep neural network to focus much more selectively on the most
promising variations.

That “may” means the self-trained monster remains something
of a black box even to its creators. What remains for AlphaZero doubters? Well,
it’s possible the recently-released Stockfish 10 could do better, while there’s
also the question of hardware. AlphaZero ran on a single machine with 4
first-generation TPUs,
compared to Stockfish running on a conventional computer with 44 CPUs. As the
paper notes, however, “Each program was run on the hardware for which it was
designed”.  

But what about the games?

There’s far more technical detail in the full paper, but it’s
time to get to the games! Once again, that’s a treat for chess fans, and we’ve
added the games to our system (click a game below to open it with some much lower-powered computer analysis):

Those in “Round 1” are the 10 games from December 2017,
while Round 2 features 110 new games from the main match played from the normal
starting position. Round 3 features 100 games from a match played where the opening
moves are made for the programs according to the TCEC Computer Chess Championship opening book. If you
switch to notation view (not chat), you’ll see which moves were “book” before
our combatants started to think.

How can we make sense of it all? Well, the first 10 games in
Rounds 2 and 3 are selected by Matthew Sadler as his favourites – and that’s
not all. He’s also produced five videos, which have something
for everyone. Enjoy!

1. All-in Defence: Stockfish 1/2-1/2 AlphaZero (replay the
game
)

For the first time now we’re seeing AlphaZero drawing and
even losing some games, but draws like this one are stunning! A true Najdorf
brawl:

2. Bold Sir Lancelot: AlphaZero 1-0 Stockfish (replay the
game
)

A white knight hops around at will in a
positional masterclass:

3. Endgame Class: Stockfish 0-1 AlphaZero (replay the game)

One of the most memorable images from the Science paper is
the following, which shows the 6-ply (3 moves by both players) positions that were featuring most often for
AlphaZero when it was playing itself in its 700,000 steps of training:

Yes, Vladimir Kramnik
seems to have stumbled on the Holy Grail of chess when preparing the Berlin
Defence to play against Garry Kasparov in London in 2000. After both the 4
hours, when it could beat Stockfish, and the 9 hours it trained in total, the most
popular position in the AlphaZero training games was the Berlin: 1.e4 e5 2.Nf3 Nc6 3.Bb5 Nf6. Here Matthew Sadler looks at how
AlphaZero went about winning a memorable game with the opening:

4. Exactly How to Attack: AlphaZero 1:0 Stockfish (replay the game)

Matthew calls this “one of my favourite games” and counts 7
pawn sacrifices from AlphaZero in total in “an absolutely fantastic attack”. He
quotes DeepMind co-founded Demis Hassabis as describing this game as, “like
chess from another planet”.  

5. Long-term Sacrifice: Stockfish 0-1 AlphaZero (replay the game)

It wasn’t AlphaZero’s choice to play the Leningrad Dutch,
but it made Matthew very jealous!  The
game features a cascade of sacrifices, with Sadler noting of one of them, “I
don’t think I’ve ever seen a tactic like this”.

That’s just scraping the surface of the AlphaZero games now
published, so it looks as though they’re going to keep us busy for a while yet.

Before that, though, there’s a little chess event starting
next week in the DeepMind offices in London. On Monday there’s the London Chess
Classic ProBiz Cup, while on Tuesday the Grand Chess Tour Playoff begins with
semi-final matches between Fabiano Caruana and Hikaru Nakamura, and Levon
Aronian and Maxime Vachier-Lagrave. You can watch all the action live here on chess24.

See also:


https://chess24.com/en/read/news/alphazero-really-is-that-good