The capacity to identify an unknown organism using the DNA sequence from a single gene has many applications. These include the development of biodiversity inventories (Janzen et al. 2005), forensics (Meiklejohn et al. 2011), biosecurity (Armstrong and Ball 2005), and the identification of cryptic species (Smith et al. 2006). The popularity and widespread use (Teletchea 2010) of the DNA barcoding approach (Hebert et al. 2003), despite broad misgivings (e.g., Smith 2005; Will et al. 2005; Rubinoff et al. 2006), attest to this. However, one major shortcoming to the standard barcoding approach is that it assumes that gene trees and species trees are synonymous, an assumption that is known not to hold in many cases (Pamilo and Nei 1988; Funk and Omland 2003). Biological processes that violate this assumption include incomplete lineage sorting and interspecific hybridization (Funk and Omland 2003). Indeed, simulation studies indicate that the concatenation approach (in which these two processes are ignored) can lead to statistically inconsistent estimation of the species tree (Kubatko and Degnan 2007).
The purpose of this article is to initiate the development of a framework for “next-gen barcoding”: one that incorporates the multispecies coalescent, but does so by comparing multiple gene sequences from an unknown taxon with a database of sequences.