This essay’s muse, photographed by N H, whose work on Flickr is truly spectacular.

It Took a Museum, a Lab, & Some Databases

Or, why identifying one group of birds can be a nightmare for birders & geneticists alike

Hundreds of miles out of place, a confused olive bundle was captured one winter day in Virginia. A member of the infamously difficult-to-identify genus Empidonax, this bird elevated the challenge to an extreme. Even with DNA analysis, this bird would require a full scientific investigation, teaching us that sometimes, even an organism’s identity is no simple matter. In a place that was cold, wet, and not particularly accommodating for a wayward insectivore, this bird reminded us that:

“Nature was not designed to make life easy for biologists.”

Of course, species that are classified in the genus Empidonax are known for their external similarities. In many cases, their identity can only be confirmed with the help of their distinctive calls. But for the Virginia bird, this is precisely the problem: in the middle of the nonbreeding season — like January — “Empids” have little reason to vocalize.

The bird on Jan. 27th (left) and Feb. 23rd (right) 2013. Notice how much more disheveled the bird looks in February

The story starts like this: on January 27th, 2013, ornithologists at Old Dominion University were conducting their usual banding session at the Virginia Zoo. Capturing birds in near-invisible mist nets draped about the station, the group encountered an unexpected bird that had all the drab ‘oliveness’, wing bars, flat bill, and body structure unique to the genus Empidonax. Still, the team could not give it a leg-band. To band a bird, one needs to have identified its species, but this drab bird with worn plumage, representing a genus whose plumages are often not distinguishable even when fresh, had to go without.

Remarkably, the bird was caught again in the same location nearly a month later, on February 23rd. Given the time of year, the bird was beginning to molt. While being removed from the mist net, the bird shed four tail feathers — an opportunity.

It was time to figure out what this out-of-place bird was.

An example of what the wayward Virginia empid would prove to be, photographed in Alberta by Tony LePrieur.

Realizing that this bird may represent a rarity for Virginia, the ornithologists sent two of the shed feathers to Washington D.C. There, the Smithsonian Institution’s Feather Identification Lab extracted DNA to sequence one special gene. The gene, known first as cytochrome c oxidase I (COI), is alternatively called the DNA Barcode Gene for its part in The Barcode of Life Project. The BOL Project’s grand vision is simple: facilitate genetic identification of species by sequencing an informative gene for as many species as possible. What does this mean?

A diagram of DNA, with emphasis on the sequence. From Encyclopedia Britannica.

It’s all in the sequence. Overtime, all of an organism’s genome has the potential to undergo change, even if in small amounts — the basis of biological evolution. But some parts of the genome change more rapidly than others.

In looking for an informative gene, what we need is a rate-of-change that is just right, a gene that changes over time at about the same rate that species split from one another. That way, the gene sequence — a progression of the nucleotides A, T, G, and C — will be unique for each species. (We can expect this because of the extraordinarily low chance that the same mutation will happen twice.)

In animals, the COI gene has a good track record for being informative. In sequencing the COI gene from as many species as possible, the BOL project will create a database of species “barcodes”, allowing researchers to identify any species by matching their specimen’s barcode sequence to one in the database. The whole process is very similar to scanning a barcode in the grocery store, except the scanning is done with benchwork and bioinformatics, not a laser gun.

In addition to barcoding the wayward Empid’s feathers, the Old Dominion team sequenced another gene — ND2 —to compare to a different database of genes: the National Institute of Health funded GenBank.

The barcoding results identified the bird as a Dusky Flycatcher with 99.8% probability, but there’s a pretty big caveat: the very closely related Pine Flycatcher doesn’t have a barcode sequenced. A 2002 study showed that much of the COI gene used for barcoding is near-identical between Dusky and Pine Flycatchers. Alas, without coverage of Pine and Dusky Flycatchers, the barcoding can’t lead to a conclusive identification. Instead, it would take a more in-depth investigation on the researchers’ part to produce robust results.

GenBank turned out not to be conclusive either. While the initial barcoding results suggested Dusky Flycatcher, GenBank’s closest match to the wayward Empid’s ND2 gene was Pine Flycatcher. Barcoding said Dusky, GenBank said Pine. The most likely explanation, then, is flawed databases. Something fishy — like an erroneous species identification or faulty gene sequence — might have made its way online. In GenBank, Dusky Flycatcher was represented by only two specimens, while Pine Flycatcher was represented by only one. This provides very little basis for comparison, and leaves open the possibility that some of the sequences are misidentified or otherwise flawed.

Even with modern molecular methods, identifying this Virginia Empid was proving to be a lengthy endeavor.

Continuing the hunt, the Old Dominion ornithologists extracted DNA from ten museum specimens — five from Dusky Flycatcher specimens and five from Pine Flycatcher specimens — for comparison, sequencing both ND2 and the barcoding COI gene. This necessary precaution makes it possible for similar specimens to group together even if they’ve been labeled as the wrong species.

Such turned out to be the case with the GenBank specimens. The two ND2 sequences labeled as Dusky Flycatcher did not match the five museum specimens of Dusky Flycatchers. The GenBank Pine Flycatcher sequence, to which the Virginia Empid was matched, was the most confounding of all, grouping together with four Dusky museum specimens and a Pine museum specimen— not the most informative sequence, and perhaps misidentified to begin with.

In the end, the wayward Empid, captured on that chilly day in late January, contained COI and ND2 genes that were most similar to Dusky Flycatcher sequences — at last, an identification.

Dusky Flycatcher range from the online encyclopedia Birds of North America. Note how out of the way Virginia is.

But this ID comes with an important lesson: for online gene databases to succeed in their goal of genetically identifying species, the catalogued gene sequences must be identified correctly to begin with; otherwise, the above fiasco will be a familiar tale. For us to be sure that the online sequences are labeled correctly, we need replication. We cannot simply match our unknown specimens with one available specimen in a database. Instead, we need comparative analysis, wherein the sequences from many specimens are compared to see how similarity groups them. This approach brings more robust results, let alone a much more efficient identification.

One must marvel at how those darn Empids seem to confound us at every turn. But with investigations like these, there is progress. We learn the importance of replication, and the necessity of comparing more than one gene. We learn that past misidentifications can unknowingly be transmitted to specimens’ gene sequences, mucking up what would be straightforward identifications. And we encounter yet another example of when museum specimens are crucial for something even as simple as identifying a species.

As for the wayward Virginia Empid, this study confirmed that is was Virginia’s first-ever record for Dusky Flycatcher. This single bird led not only to excitement at the Virginia Zoo banding station, but also to a systematic revision of our online genetic databases.

The presence of a single organism, in other words, can be a significant event. Indeed, much of scientific advance depends on the contingency of nature. If ornithologists hadn’t been there at the right time, or if the bird occurred just far enough away to be missed, none of this study would have come to pass. We are surrounded by unknown unknowns, unexpected organisms all over the earth that even the most astute observers narrowly miss.

This leaves me with one nagging thought:

What’s out there that we’re missing?


Heller, Erin L., et al. “Overcoming challenges to morphological and molecular identification of Empidonax flycatchers: a case study with a Dusky Flycatcher.” Journal of Field Ornithology (2016).

Pereyra, Maria E. and James A. Sedgwick. 2015. Dusky Flycatcher (Empidonax oberholseri), The Birds of North America Online (A. Poole, Ed.). Ithaca: Cornell Lab of Ornithology; Retrieved from the Birds of North America Online:, doi:10.2173/bna.78