Data analysis

Information organization and analysis

Sequence information must be interpreted (annotated) in terms of genes, proteins and the functions they perform. An automated sequence annotation pipeline was developed and used to analyze sequence data from the human intestinal metagenome.

We identified over 19000 different functions in the gene catalog that we established. The statistical analysis indicates that we have captured essentially all of the functions present in our 124 samples and thus have an exhaustive view of the genetic potential of the bacteria from the human gut. A large proportion of the functions, over 5000, were never found before. This illustrates the novelty that our analysis has revealed.

We also identified some 6000 functions that are present in every individual of our cohort. We suggest that they constitute “minimal metagenome”, which may be required for the proper function of the human gut microbiota. Among these are, as expected, the functions that our genome lacks, such as the capacity to degrade the fibers present in our food and thus extract more energy from it or to synthesize vitamins and amino-acids essential for us. Interestingly, we have very little information for many of the minimal metagenome functions and our findings should prompt their study.

 Beyond the minimal metagenome, we defined the set of 1200 functions as required for any bacterium to strive in the human gut and suggest that they represent the “minimal gut genome”. About a half are present in most bacteria with sequenced genomes and are necessary for the bacterial life. A large number, however, are found only rarely among the bacteria with sequenced genomes and may well be specific for the gut bacteria.  Their study should lead to a much better understanding of our microbial companions than we presently have.