Biology's Big Problem: There's Too Much Data to Handle - Wired Science | Big Data Analysis in the Clouds |

Twenty years ago, sequencing the human genome was one of the most ambitious science projects ever attempted. Today, compared to the collection of genomes of the microorganisms living in our bodies, the ocean, the soil and elsewhere, each human genome, which easily fits on a DVD, is comparatively simple. Its 3 billion DNA base pairs and about 20,000 genes seem paltry next to the roughly 100 billion bases and millions of genes that make up the microbes found in the human body.

And a host of other variables accompanies that microbial DNA, including the age and health status of the microbial host, when and where the sample was collected, and how it was collected and processed. Take the mouth, populated by hundreds of species of microbes, with as many as tens of thousands of organisms living on each tooth. Beyond the challenges of analyzing all of these, scientists need to figure out how to reliably and reproducibly characterize the environment where they collect the data.

Via Andrea Naranjo