Sunday, August 30, 2009

Why you should not trust chess database statistics

Have a look at this position, which arises in Panov Attack in the Caro-Kann, or from some lines of Queen’s Gambit Declined.

1. e4 c6 2. d4 d5 3. exd5 cxd5 4. c4 Nf6 5. Nc3 e6 6. Nf3 Be7 7. cxd5 Nxd5 8.
Bd3 Nc6 9. O-O O-O 10. Re1
and now Black can play Qa5, which is a very rare sideline


Say, the database will tell you that 4 games have been played with this move, and black scored 50% with Qa5. The move itself sure does not look quite right, but Viktor Korchnoi himself played it, and the score seems not too bad. The 50% however is completely deceptive, have a look at another position from Brodsky – Maiorov, one of the games played in this line.

image  White to move

White (a Grandmaster) is completely winning, but he played the completely inexplicable 47. Rd2?? dropping the rook on c4, and resigned a couple of moves later. So that 50% should have really been 75% in White’s favour! 4 games is really too small of a pool to rely on the numbers, so you should really look at objective value of Qa5 instead of relying on rather meaningless percentage from a database. I had failed to do that, so Black’s other loss is mine – from my 2002 game against Stephen Glinert! The title of the post is of course not fully true – sometimes you should check how well each side is scoring in a line before playing it, but it’s more important to understand the meaning of moves while building up your opening repertoire, especially if you rely on sidelines like Qa5.

