Thirty-one papers simultaneously published by the ENCODE (Encyclopedia of DNA Elements) project indelibly repaint the picture of the human genome, throwing its information-processing characteristics into even sharper relief. (The whole collection—six papers in Nature, 18 in Genome Research, four in Genome Biology, and one in BioMed Central Genetics—is posted with open access at Nature Encode. The lead article, “Architecture of the human regulatory network derived from ENCODE data,” is must reading for information-systems designers.)
Just a few years ago, the prevailing wisdom said that the genome comprises 3 percent or so genes and 97 percent “junk” (with 2 or 3 percent of that junk consisting of the fossilized remains of retroviruses that infected our ancestors somewhere along the line). After a decade of painstaking analysis by more than 200 scientists, the new ENCODE data show that indeed 2.94 percent of the genome is protein-coding genes, while 80.4 percent of sequences regulate how those genes get turned on, turned off, expressed, processed, and modified.