Cladistic analysis

From BiodivBorneo09

Jump to: navigation, search

Cladistic analysis of recorded characters



Evolutionary relationships

  • Why?
    • Basis of taxonomy
    • Understanding of evolutionary processes
    • (re-)Integrating ecology and evolution
  • How?
    • Verbal hypotheses about character change; Linnaeus, etc (still good)
  • Maximum/overall similarity (phenetics); Sokal
  • Shared, derived characters (cladistics); Hennig


A character (or trait) is simply a description of some aspect of an organism. Within the universe of all characters are a subset useful for diagnosis (identification), and an even smaller subset useful for phylogenetic reconstruction.


  • Simplest: development of obvious, complex character, shared by all descendants
    • presence of this character defines membership in a clade
    • many such characters define a tree
  • In reality: recognition of such {\it homologous} characters is difficult (e.g., weighting, move to DNA)
  • Parallel evolution ({\it homoplasy}) occurs... confuses signal


  • Maximum parsimony: assumes that evolution proceeds `parsimoniously:' minimizes the number of evolutionary character transitions (e.g., Felsenstein's \verb|pars| in \verb|phylip|)
  • Maximum likelihood: incorporates models of the probabilities of character transition, and maximizes the likelihood of trees, given the observed data
  • Search algorithms: the number of potential trees is vast: clever search methods needed to find the maxima (see phylip documentation)
  • Other tools: paup*, mrbayes, garli...

Morphology vs. DNA

  • Morphology: difficult to recognize homologies, and small number of characters
  • DNA fragments: relatively easy, lots of characters, problem with homoplasy---phenetic algorithms
  • sequences: increasingly easy, huge numbers of characters, homology still an issue, but good models of state transition
  • Today: most systematists use DNA to construct trees, and reconstruct morphology on these trees
  • But... morphology still powerful with careful anatomical studies (dying out); fossils

The Lab

  • Build character matrix in mesquite
  • Use mesquite as diagnostic key
  • Review characters for phylogenetic reconstruction. Are they appropriate?

Export characters

  • In Mesquite, File --> Export... --> Simplified NEXUS. Tick `Use DATA instead of TAXA/CHARACTERS block, and tick `Convert partial ambiguities to missing.' Save as chars.nex.

Run a parsimony analysis in PHYLIP

Preparing the file

Unfortunately, Mesquite will not export to PHYLIP format if there is ever a character state other than 0 or 1, so we need to `hack' the input file manually:

  • Duplicate chars.nex, and rename as infile.
  • Open infile in a text editor.
  • Delete everything down to the line containing, e.g. DIMENSIONS NTAX=14 NCHAR=5;. Edit this line so that there are just two numbers separated by a space, e.g. 14 5. These numbers are the dimensions of the matrix (taxa, or rows; characters, or columns).
  • The data block comes next. The format of this must be:
    • Taxon name up to 10 characters, with spaces filling in if name too short. You will have to truncate the name if it is longer than ten characters.
    • Delete the TAB at the beginning of each line.
    • Data start in the eleventh column, just chactacters 0 to 9 and ?.
  • Delete everything below the matrix.

The finished product should look like:

14 5
Annon     0?000
Dipter    ?0100
Apocyn    110?0
Euphor    ?11?0
Sapind    0?001
Laurac    0?000
Malvac    0?100
Clusia    1?000
Rubiac    0?110
Myrtac    0?010
Myrist    13000
Moraca    11000
Sapota    11?00
Fabaca    0?101

Running the parsimony program

  • Make sure the program pars.exe (pars on a Mac) is in the same directory as infile.
  • Double-click to run the program pars.
  • For the moment, accept the default settings, and just type y.
  • A file outtree will be made.
  • Open outtree in TreeView.

Find the consensus phylogeny

It is very likely that multiple, equally parsimonious trees were found. We need to find a consensus tree. The strict consensus only includes relationships present in all trees, while the majority rule consensus finds the relationships present in the majority of the trees.

  • Rename outtree to intree
  • Make sure the program consense.exe (consense on a Mac) is in the same directory as intree
  • Double-click to run the program consense.
  • For the moment, accept the default settings, and just type y.
  • A file outtree will be made.
  • Rename outtree to
  • Open in TreeView. There should be a single tree, with polytomies.
  • Unfortunately, the Mesquite parser chokes on newline characters in the phylogeny file, so one more step is needed:
    • Open in your editor, and find and replace all the newlines (returns), so that there remains one long, unbroken character string. Careful not to delete any of the visible characters.

Run a parsimony analysis in R

Believe it or not, there is now a package in R to do simple parsimony analysis. However, it is not as comprehensive as PHYLIP and is offered here `for fun.' There is no way to do a consensus analysis.

  • Start R, and change working directory to where you saved chars.nex.
  • Cut and paste the following:

chars <-"chars.nex") <- phyDat(chars, type="USER", levels=c("0","1","2","3","4","5","6","7","8","9") ) <- function(chars) {
  ntaxa <- length(attributes(chars)$names)
  a <- unlist(chars, use.names=F)
  a[a=="?"] <- NA
  b <- matrix(as.numeric(a), nrow=ntaxa, byrow=T)
  rownames(b) <- attributes(chars)$names
z  <-
base <- as.phylo(hclust(dist(z)))
# plot(base)
ptree <- optim.parsimony(base,$tree
write.tree(ptree, file="")

Import tree back into mesquite

  • Taxa&Trees --> Import File with Trees --> Include Contents..., select Interpret as Phylip (trees).
  • Taxa&Trees --> New Tree Window --> Stored Trees (select Imported trees if asked). A tree window should open with your tree.
  • Reconstruct character evolution on tree