Centre for Marine Ecosystems Research
King Abdullah University of Science and Technology National Institutes of Health Spanish Ministry of Science and Innovation
© 2020 The Authors. Environmental Microbiology published by Society for Applied Microbiology and John Wiley & Sons Ltd. Massive metagenomic sequencing combined with gene prediction methods were previously used to compile the gene catalogue of the ocean and host-associated microbes. Global expeditions conducted over the past 15 years have sampled the ocean to build a catalogue of genes from pelagic microbes. Here we undertook a large sequencing effort of a perturbed Red Sea plankton community to uncover that the rate of gene discovery increases continuously with sequencing effort, with no indication that the retrieved 2.83 million non-redundant (complete) genes predicted from the experiment represented a nearly complete inventory of the genes present in the sampled community (i.e., no evidence of saturation). The underlying reason is the Pareto-like distribution of the abundance of genes in the plankton community, resulting in a very long tail of millions of genes present at remarkably low abundances, which can only be retrieved through massive sequencing. Microbial metagenomic projects retrieve a variable number of unique genes per Tera base-pair (Tbp), with a median value of 14.7 million unique genes per Tbp sequenced across projects. The increase in the rate of gene discovery in microbial metagenomes with sequencing effort implies that there is ample room for new gene discovery in further ocean and holobiont sequencing studies.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.