Simply by assuming that the (protein) coding and non-coding fractions of the genome must have different dynamics and that the non-coding fraction must be particularly versatile and therefore be controlled by a variety of (unspecified) probability distribution functions (pdf’s), we are able to predict that the number of ORFs for Eukaryotes follows a Benford distribution and must therefore have a specific logarithmic form. Using the data for the 1000+ genomes available to us in early 2010, we find that the Benford distribution provides excellent fits to the data over several orders of magnitude.
Friar JL, Goldman T, Pérez–Mercader J (2012) Genome Sizes and the Benford Distribution. PLoS ONE 7(5): e36624. http://dx.doi.org/doi:10.1371/journal.pone.0036624