An unexamined life is not worth living.

Sunday, January 6, 2008

Chess database formats - PGN vs. Chessbase

In what format do you store your chess data? This question has plagued me for years, so I finally decided to blurb it all out. There are at least popular 3 file formats I can think of - Chessbase (CB) binary format (cbh, cbv, etc), Chess Assistant (CA) binary format, and standard text pgn format (with the spec here). Most other chess databases, such as SCID, Jose, etc also have their own binary formats, but I am not as familiar with those. There have also been a few efforts to represent chess databases in some open XML format, but of all schemas proposed, none has really gained enough popularity. At least most chess games available on the net are still available only in either PGN, Chessbase, or CA formats.

So what tools are available for dealing with each format? Virtually every chess tool can read pgn. Chessbase does not handle Chess Assistant at all, Chess Assistant allows you to read Chessbase format, which is pretty impressive, given that no other non-chessbase tool does that. So what are the advantages pgn format versus say, chessbase?

PGN advantages: 1) it is free to in the sense that you don't need a proprietary software to view its contents (SCID would do the job), has an open spec, so 'anybody' can implement a parser. There are few parsers available in source code (I managed to find c++, C# and perl parsers, there are probably others). That being said, I suggest you do not write your own parser, because it is not a trivial task, but rather take an existing one (of course as long as license is not an issue).

2) since PGN is text-based - one can just load them in notepad, as long as the file size is not prohibitively large. Pgn is also very readable - the moves are just in algebraic notation. If you store your personal games in PGN, you can version control them as well, and look at the differences between revisions. It is also trivial to merge multiple PGN files into one via a one line in DOS.

3) A lot of free command line and GUI PGN tools are already available. pgn-extract is a great command line tool for filtering pgn games for material, position, or tag information (e.g. ECO, or Players). Palview is great for generating html with javascript for replaying games. In fact I use pgn-extract and Palview together to generate content for this blog, but this is worth another post. This site is a great resource on chess database utilities, most of which operate on pgn.

So if PGN has so many advantages, why use anything else for storing chess databases? Why create all these compatibility issues between multiple chess database vendors? Same reason as why XML is not used as a backend storage for storing data in SQL databases - performance.

Chessbase format provides more than performance improvements though: 1) In addition to performance, filesize is also smaller - chessbase splits up databases into multiple files, and that allows for some normalization of headers, etc.

2) Multimedia support - one can embed audio and video into games.

3) it can be a real database - in addition to raw moves, chessbase stores other metadata, such as opening and endgame keys, allows to tag positions with so called "medals" and so on.

I have been using Chessbase light for storing my games since around 1998, and just don't want to lose all the tagging that I've added to my games for many years. So currently I maintain keep old databases in Chessbase format, and whenever I create new smaller databases (say, for selected games by Kramnik in the Sveshnikov), because Chessbase light can still edit them, performance is not an issue, and I don't have to re-export all games into PGN for using Palview.

7 comments:

  1. Hi Roman,

    Thanks for the plug for pgn-extract :)

    David

    ReplyDelete
  2. Hi Roman,
    First of all, thanks for adding my link to your “delicious” bookmark – very tasteful.
    As far as PGN vs. CB is concerned my vote goes to PGN. Portability is very important and everything is cross-platform nowadays. Multilingual (international) support is trivial with PGN – I mean in the annotations not the SAN moves text. As far as I know CB supports very proprietary multi-language feature which is useless outside their format. Do they have an implementation for Linux? IMO, for a proprietary format to become popular it has to offer much more than a couple of fancy features.

    You made a valid point about PGN parsers: “I suggest you do not write your own parser, because it is not a trivial task”. You are the strongest player I’ve met with such solid understanding of software. I recently took a PGN game and made a simple PGN parsing test that illustrates some typical issues with parsing. Take a look here.

    ReplyDelete
  3. This may be useful for people looking for an efficient way to store games:

    IGN - Integer game notation

    ReplyDelete
  4. Nikolai, that's an interesting test of PGN errors. What I found though is that people usually enter PGN via making moves in some software like Chessbase, so most PGN files flying around the web are actually valid.

    The best I found in my own search for a parser was this project: (I was looking for .NET implementation): http://sourceforge.net/projects/cafechess/

    It contains among other things a PGN->XML conversion routine, so I've been using it since then in all of my PGN manipulation scripts (including the one that uploads palview html for games to this blog).

    ReplyDelete
  5. Will Entriken,
    You don’t specify the goals of this IGN. It seems very machine oriented and quite human unfriendly. I can see how parsing IGN will be more difficult compared to PGN.

    Roman,
    I agree that “Cafechess Library” is a very interesting project. It seems well documented and I will take a closer look when I have time. Do you know what happened to www.cafechess.org?
    I have many advanced tests that I use when testing my parser. For example, you can test the annotations support using this Support for PGN annotations
    Many parsers ignore leading comments or mishandle multiple comments but those are an integral part of PGN. I like multiple annotations feature and I think it will become more popular way of mixing/sharing annotations. This is one more advantage of PGN format.

    ReplyDelete
  6. Thanks for the feedback. The goal is to minimize memory usage for database/opening book applications. I am using this for some research, and am not sure if anyone may be interested to use it for something.

    ReplyDelete
  7. Take a look at
    http://immortal223.borda.ru/?1-8-0-00000071-000-0-0-1235381780

    for more chess books

    ReplyDelete

Hit Counter