Friday, 6 May 2011

Identification keys

"When he has learnt that bottinney means a knowledge of plants, he goes and knows 'em.  That's our system, Nickelby; what do you think of it?"
Charles Dickens, Nicholas Nickelby.
Many people characterise botanists as people who know the names of plants, and I’m often asked to identify specimens and, increasingly, photographs.  However, identifying plants is not a widespread skill among us; botany involves a lot more than knowing names.  Even plant taxonomists, who are specialists in the identification, classification, evolution, and naming of plants, sometimes know well only the members of the groups they study.  For ability to identify a wide range of plants on sight, I’d recommend horticulturists, conservation biologists, and field ecologists.
Identifying plants is such a seemingly simple skill that it’s often under-rated and certainly under-valued.  The professional identification of a possibly poisonous plant can be a matter of life and death.  I’ve identified plants for both police and defendants in drug and burglary cases, hospitals when children have eaten berries, vets, farmers, insurance companies, and for disputing neighbours.  Conservation depends on accurate identification of rare plants and the weeds that threaten them.  Identification of a sample opens up the world of information about that plant, which would otherwise be unavailable.
How do botanists identify plants?
The easiest and commonest way is simply to recognise a species you’ve seen before and to recall its name.  The process is the same as the way we recognise our friends, or cars, celebrities, or dinosaurs.  I think humans differ in this ability, and children are often very good at it.  (Incidentally that skill makes some parents believe their child is some sort of prodigy, but I don't think it's a good indicator of scientific ability.)
Anyway, that approach will always fail when you’re confronted with a plant you’ve never seen before; so then you need to draw on the recorded knowledge of others.
The simplest resources are picture books and Google images, although pictures vary in their usefulness.  A clear diagram drawn by a botanical artist can draw the user’s attention to the important features.  Digital photography, especially when layered to improve depth of field, can be amazingly clear and accurate, but it takes both photographic skill and knowledge of the plants to do this well.  A poor diagram or a blurry photo is almost useless.
More scientific books, like Floras, have written descriptions of plants that detail, often in technical language, the attributes of species.  Descriptions alone are precise, but they can be hard to work with, and additionally a picture is worth a thousand words.  
And there’s immediately an efficiency problem with using pictures and descriptions alone.  There are 250,000 species of flowering plants, so if you could compare your unknown plant with a photo or description every 5 seconds, you’d need about two weeks of non-stop comparing before you could check them all.  Even then, there will be a lot of look-alikes that would baffle you.  We need a way to quickly whittle down the possibilities, and that’s where keys come in.
Identification keys are a simple kind of expert system.  They’re written by an expert and so they focus on the characteristics the expert knows to be reliable.  For instance, if you didn’t know that the way leaf edges join in the leaf buds is important for the identification of hebes, you could easily confuse Veronica stricta and V. salicifolia.

Veronica stricta
Veronica salicifolia
Keys work by ruling out possibilities with pairs of contrasting statements, so that whole groups of plants can be quickly eliminated without having to compare them all.  For instance, consider the following couplet:
1.         Seedlings with 1 cotyledon ………………………….2
Seedlings with 2 cotyledons …………………………n
You’ll see the two contrasting statements in this couplet can’t both be true for an unknown plant.  The upper lead eliminates all the eudicots and basal angiosperms (what we used to call the dicotyledons; nearly 200,000 species), whereas the lower lead eliminates all the monocotyledons (about 60,000 species).  That’s a lot more efficient than comparing every one with your unknown.
Each lead of the couplet directs the enquiry to the next couplet, no. 2 in the case of monocotyledons, n in the case of the remainder of flowering plants. Couplet 2 should similarly divide the monocotyledons into two groups, maybe based on whether they have showy petals (e.g. lilies) or not (e.g. grasses).  In the end, no further division is possible and instead of a lead to another couplet, a name is presented.
You might immediately spot a problem with the example above: how can this couplet help if your sample isn’t a seedling?
It’s helpful to the user if we can add some supporting characters to cover that situation, rewriting couplet one:
1.   Seedlings with 1 cotyledon; leaves without petioles; leaf veins usually parallel; flower parts in threes or multiples of 3; taproot absent at maturity ..............……….2
      Seedlings with 2 cotyledons; leaves with or without petioles; leaf veins usually forming a network; flower parts mostly in 4’s or 5’s or multiples thereof; taproot often present at maturity …………………………………………….………………n
You can see though that I’ve had to add some weasel-words to cover exceptions, words like usually, mostly, and often.  Character states like these, which are not universal in the group concerned, make the key less accurate, even while making it more helpful.  In principle, we should focus on character states that are:

  • likely to be present, 
  • easy to observe without a microscope, 
  • easily understood by the user, 
  • qualitative rather than quantitative, 
  • and non-overlapping.  

For instance:
Leaves 10—60 mm long …………………..Veronica hulkeana
Leaves 20—150 mm long …………………. Veronica stricta
is not as useful as:
Leaf margins toothed …………………..Veronica hulkeana
Leaf margins entire …………………. Veronica stricta
Veronica hulkeana 
Veronica stricta
In a good key, the two statements of a couplet should be exactly comparable, using the same characters in the same order.  As a contrived bad example, consider the following, which uses useful characters for these two species, but not in a way that's helpful because the user isn't told the states for the other species:
Leaves toothed; flowers in terminal panicles ……..Veronica hulkeana
Corolla lobes erect; capsules acute ………………. Veronica stricta

Some people argue that each couplet should divide the remaining species equally, rather than separating off one species at a time.  The argument is that each couplet presents an opportunity for error, and with an unbalanced key, some species will require a large number of couplets to reach the answer.  In a balanced key, all species will require the same number of couplets to be compared.  However, on average, they’re both the same, and an efficient key will need n–1 couplets, where n is the number of species, whatever its symmetry.  You might argue that an unbalanced key that keys out the commonest species in very few couplets is more efficient over time.  There’s no right answer to this one, but I believe it’s usually better to sacrifice elegance and efficiency for accuracy and reliability.
Sometimes when you go astray in a key, the questions don't seem to make any sense for your plant.  That's a clue that you've strayed into a part of the key that's dealing with a group of species that doesn't include your sample.
When you reach an answer after working through a key, it pays to check if you’re right.  Compare the sample with descriptions, diagrams, or photographs; even better if you have a herbarium (a systematically organised collection of pressed samples).
Keys have evolved over time.  The early ones often had more than two options (polytomous), where we now have paired couplets (dichotomous).  A disadvantage of polytomous keys is the user didn’t know how many to expect and often overlooked some possibilities, but sometimes they cover the available options more accurately.
The layout of keys varies.  Most nowadays have numbered couplets, but some are indented, a system that uses up a lot of page space.  Most people prefer to have the two comparable statements close together; they can be pages apart in long indented keys.
Note the syntax of the statements in keys.  Typically they don't have verbs; they're merely a noun followed by some adjectives or measurements.  This efficient shorthand has evolved from the use of Latin in biology a couple of hundred years ago.
What about the future?  Computerised keys (DELTA, LuCID) get around some of the problems of written keys.  In particular they can get around the problem of a sample that doesn’t have every life stage.  If fruits are absent on your sample, just answer the questions you can; eventually you’ll get to an answer anyway rather than getting stuck as you would in a traditional key.  Computerised keys are often more efficient too, because they require fewer decisions before you get to an answer.
Image recognition is also a future possibility, so that by entering a photograph of an unknown into a computer network it can be matched with stored images.  Finally, DNA barcoding is a developing system for biological identification.  It relies on the presence of portions of the genome that are universally reliable for the identification of species.  There are practical problems with it so far in plants, but I have no doubt the concept is a good one and we’ll see it in everyday use reasonably soon.


  1. Your point about indented vs. numbered-couplet keys lines up with the common advice of splitting long, complicated sentences up into separate, simpler sentences, connected by anaphora or by explicitly naming the objects under discussion.

    Your point lines up with some common advice. The point is about indented vs. numbered-couplet keys. The common advice is to split certain sentences. The sentences to be split are those that are long and complicated. The results of the split are separate, simpler sentences. The simpler sentences can be connected by anaphora. Alternatively, they can be connected by naming the objects under discussion.

    (In natural language, the long, complex sentences are said to be in "syntactic form", while the shorter, simpler, side-by-side, connected sentences are said to be in "paratactic form". At least, that's what I've picked up from random reading around the subject.)

  2. A thought on computerised keys: by making multiple entry points into the keying process available, they also permit the software to suggest good experiments to perform next, or likely characters to watch out for in future observations of this taxon.

    For example, once the user has selected "taproot present at maturity", the system can spit out something like: "This character is ambiguous, but usually indicates non-monocotyledy. Do you have seedlings of this plant available? If so, how many cotyledons does each seedling have?" or "When observing seedlings of this type, you should expect to see two cotyledons on each seedling."

  3. I like that idea. The choices the user makes tells the system something about the specimen. Cool.