We show how to obtain several times higher compression ratio than of the best reported results, on two large genome collections (1092 human and 775 plant genomes). Our input are VCF files restricted to their essential fields. More precisely, our novel LZ-style compression algorithm squeezes a single human genome to about 400KB. The key to high compression is to look for similarities across the whole collection, not just against one reference sequence, what is typical for existing solutions.
Availability: http://sun.aei.polsl.pl/tgc (also as Supplementary material) under a free license.