From the sequence of a single genome, it’s impossible to determine which genes are shared by all members of a species and which are possessed by only some. However, just one additional sequence offers the opportunity to distinguish shared and variable content. As more genomes are sequenced, more genes are discovered and some genes that were believed to be ubiquitous are found to be lacking from certain individuals. As a result, the estimated size of a species’s core genome—the set of genes shared by all members of a species—generally decreases, and the size of the pangenome—the set of all distinct genes in the species—increases.
VISUALIZING THE PANGENOMEA reference genome built from the DNA of an individual organism can be visualized as a linear sequence (top). But there is a growing appreciation that this sort of representation fails to reflect the diversity among individuals of a species, which includes not just sequence variation within shared genes, but often different genes altogether (middle). To visualize the genomic content of a species, researchers use interconnected nodes representing all possible combinations of genomic segments or genes found in a species
(bottom). Such an approach makes all known sequence information available simultaneously, instead of hiding some away as annotations describing how newly sequenced genomes differ from a linear reference.