DISPERSAL DIVERSITY : STATISTICS AND TESTS

This is a collection of R functions to facilitate analysis of dispersal in biological communities. The bulk of the functions calculate dispersal diversity statistics and allow for comparison of diversity statistics, as described in Scofield et al. 2012 American Naturalist 180: 719-732.

There are also new functions for calculating allelic diversity using these same conceptual and statistical principles, and for comparing allele diversity statistics.

The most current versions of all files can be found below & here:
https://github.com/douglasgscofield/dispersalDiversity


The Mann-Whitney-Wilcoxon nested ranks test we originally provided here has been made an R package and has been moved to its own repository.


These statistical tools were developed in collaboration with Peter Smouse (Rutgers University) and Victoria Sork (UCLA) and were funded by U.S. National Science Foundation awards NSF-DEB-0514956 and NSF-DEB-0516529.


Dispersal Diversity

Input requirements

All functions take as input a simple data structure: a table of site (rows) by source (columns) counts. Though we originally developed the diversity tests to understand seed dispersal in plant populations, the tests themselves should be useful for biodiversity data or any other diversity-like data that can be expressed with this same data structure.

Getting started

The pmiDiversity.R and diversityTests.R source files are required for performing diversity tests. If all that is desired are PMI (Grivet et al. 2005Scofield et al. 2010Scofield et al. 2011) and diversity (Scofield et al. 2012) statistics (qggαg, etc.), the source file pmiDiversity.R contains thepmiDiversity() function that provides these and can be used separately.

Put all the source files in the same directory, and within your R session simply

source("diversityTests.R")

Additional source files are provided to perform other tasks. plotPairwiseMatrix.R is available for plotting pairwise divergence/overlap matrices. More information is available below. This file requires the pmiDiversity.R source file to be available within the same directory:

source("plotPairwiseMatrix.R")

gammaAccum.R is available for collecting γ diversity accumulation information and plotting this. More information is avaialble below. This file requires the pmiDiversity.R source file to be available within the same directory:

source("gammaAccum.R")

pmiDiversity.R

Defines the R function pmiDiversity() which takes a site-by-source table and produces statistics for Probability of Maternal Identity aka PMI (Grivet et al. 2005, Scofield et al. 2010, Scofield et al. 2011) and dispersal diversity (Scofield et al. 2012). Three different PMI and diversity statistics are calculated:

  • qgg-based, known to be biased (Grivet et al. 2005)
  • rgg-based, unbiased but poor performers at low sample sizes (Grivet et al. 2005, Scofield et al.2012)
  • q*gg-based, which apply the transformation developed by Nielsen et al. (2003) to be unbiased and seem to perform well (Scofield et al. 2010, Scofield et al. 2011, Scofield et al. 2012).

diversityTests.R

Defines several R functions which, like pmiDiversity(), take a site-by-source table (one or more) and test diversity statistics within and among them. See (Scofield et al. 2012) for methodological details. The file pmiDiversity.R (see above) is required to be in the same directory, as it provides functions used here.

alphaDiversityTest(tab) : Test for differences in α diversity among sites within a single dataset

alphaContrastTest(tab.a, tab.b) : Test whether there is a difference in the α diversity between two datasets

alphaContrastTest.3(tab.a, tab.b, tab.c) : Test whether there is a difference in the α diversity among three datasets

plotAlphaTest(result) : Plot the list returned from alphaDiversityTest() or alphaContrastTest() for evaluation

pairwiseMeanTest(tab) : Test whether mean pairwise divergence/overlap among sites is different from the null espectation

plotPairwiseMeanTest() : Plot the list returned from the above test for evaluation

gammaContrastTest(tab.a, tab.b) : Test whether there is a difference in the γ diversity between two datasets

gammaContrastTest.3(tab.a, tab.b, tab.c) : Test whether there is a difference in the γ diversity among three datasets

membershipPlot.R

Provides the function membershipPlot() for plotting relative representations of sources within sites, and source sharing across sites, using the same site-by-source table used for input to thepmiDiversity() function. Examples of membership plots can be seen in Figure 2A-C of Scofield et al.2012 Am Nat. Singleton sources (those that appear just once in just one site) are distinguished using a white background, while multiton sources (those that appear multiple times but still in just one site) can be distinguished with a gray background using the option distinguish.multiton=TRUE. Other options are provided for controlling labelling of the plot and producing output to PDF or PostScript files.

The function membershipPlot.v0() provides the original membership plot functionality. The primary function has been generalised.

plotPairwiseMatrix.R

Provides a function for plotting pairwise diversity matrices as returned by the pmiDiversity()function, examples of which can be seen in Figure 4A-C of Scofield et al. Am Nat.

plotPairwiseMatrix() : Create a visual plot of pairwise divergence or overlap values as calculated bypmiDiversity()

For example, with tab defined as above, plot the divergence matrix based on rgg calculations, labelling the axes “Seed Pool”, using the following code:

pmiD = pmiDiversity(tab)
plotPairwiseMatrix(pairwise.mat=pmiD$r.divergence.mat, 
                   pairwise.mean=pmiD$r.divergence, 
                   statistic="divergence", 
                   axis.label="Seed Pool")

gammaAccum.R

Provides functions for calculating γ accumulation across sites, and plotting the result, examples of which can be seen in Figure 4D-F of Scofield et al. Am Nat. The file pmiDiversity.R (see above) is required to be in the same directory, as it provides functions used here.

A typical workflow using these functions would be:

rga.result = runGammaAccum(tab)
plotGammaAccum(rga.result)

runGammaAccum(tab)

Perform a γ diversity accumulation on the site-by-source data in tab. The result is returned in a list, which may be passed to plotGammaAccum() to plot the result. Several arguments control the method of accumulation and value of γ calculated. Only the defaults have been tested; the others were developed while exploring the data and must be considered experimental.

tab : Site-by-source table, same format as that passed to pmiDiversity()

gamma.method : Calculate γ using "r" (default), "q.nielsen" or "q" method (see paper)

resample.method : "permute" (default) or "bootstrap"; whether to resample sites without ("permute") or with ("bootstrap") replacement

accum.method : "random" (default) or "proximity". If proximity is used, then distance.file must be supplied

distance.file : A file or data.frame containing three columns of data, with the header/column names being poolX, and Y, containing the spatial locations of the seed pools named in the row names of tab; only used with accum.method="proximity"

plotGammaAccum(rga.result)

Create a visual plot of γ accumulation results from runGammaAccum().

Additional functions

The following functions typically won’t be used separately, use runGammaAccum() instead.

gammaAccum() : Workhorse function for γ accumulation

gammaAccumStats() : Extracts stats from the result of gammaAccum()

runGammaAccumSimple() : Wrapper that runs and then returns stats from gammaAccum()

Allelic Diversity

These functions calculate allelic alpha, beta and gamma diversity as described by Sork et al. (unpublished), following the same conceptual and statistical principles described in Scofield et al. (2012). These have been used to calculate the structure of allelic diversity for complete progeny genotypes as well as their decomposed male and female gametes, and to compare alpha and gamma diversity of progeny dispersed relatively short vs. long distances.

Input requirements

Input begins as a file of genotypes in GenAlEx format, which is read using the [readGenalex] R package (https://github.com/douglasgscofield/readGenalex) available via CRAN:

install.packages("readGenalex")

Usage

The augmented data.frame returned by readGenalex() is then processed into a list of tables, one per locus, using allele.createTableList(). If the true ploidy of the input data does not the ploidy of the GenAlEx file, for example if haploid gametes are represented by a pair of homozygous alleles, then use the new.ploidy=1 argument to allele.createTableList() to reduce the ploidy to its true level.

The list of tables is analysed as a unit by allele.pmiDiversity(), and can be passed to one of the contrast functions for testing against another list of tables.

Workflow

The workflow to calculate basic allelic diversity statistics:

library(readGenalex)
source("allelePmiDiversity.R")
dat = readGenalex("GenAlEx-format-file-of-genotypes.txt")
gt = allele.createTableList(dat)
div = allele.pmiDiversity(gt)

For comparing allele diversity between two different samples:

library(readGenalex)
source("allelePmiDiversity.R")
source("alleleDiversityTests.R")
dat1 = readGenalex("file-of-genotypes-sample-1.txt")
dat2 = readGenalex("file-of-genotypes-sample-2.txt")
gt1 = allele.createTableList(dat1)
gt2 = allele.createTableList(dat2)
alpha.contrast = allele.alphaContrastTest(gt1, gt2)
gamma.contrast = allele.gammaContrastTest(gt1, gt2)

For calculating and plotting gamma accumulation curves across all loci:

library(readGenalex)
source("allelePmiDiversity.R")
source("alleleGammaAccum.R")
dat = readGenalex("genotypes.txt")
lst = allele.createTableList(dat)
allele.rga.result = allele.runGammaAccum(lst)
plotGammaAccum(allele.rga.result)

Functions in allelePmiDiversity.R

allele.pmiDiversity() : The function calculating diversity for a set of loci. The single argument is a list produced by allele.createTableList(), and it uses the functionallele.pmiDiversitySingleLocus().

allele.createTableList() : Take a data.frame of genotypes read by readGenalex(), produce a list of allele count tables used by the other functions. Each entry of the list is, for each locus, a table of site x allele counts, with row names being the site names, and column names being the names given to the individual alleles.

allele.pmiDiversitySingleLocus() : The single argument is, for a single locus, a table of site × allele counts, with row names being the site names, and column names being the names given to the individual alleles.

Functions in alleleDiversityTests.R

allele.alphaDiversityTest(lst) : Test whether there is a difference in the alpha diversity among patches in an allele diversity dataset, that is, whether β = 1 or δ = 0 across a collection of patches at a site (see Sork et al.).

allele.alphaContrastTest(lst.a, lst.b) : Test whether there is a difference in the alpha diversity between two lists of allele diversity datasets.

allele.gammaContrastTest(lst.a, lst.b) : Test whether there is a difference in the gamma diversity between two allele diversity datasets.

Functions in alleleGammaAccum.R

allele.runGammaAccum(lst) : Perform a gamma diversity accumulation on the site-by-source data in tab. Several arguments control the method of accumulation and value of gamma calculated. Other arguments are identical to gammaAccum(). Only the defaults have been tested; the others were developed while exploring the data and must be considered experimental. The result is returned in a list, which may be passed to plotGammaAccum() to plot the result.


References

Scofield DG, Smouse PE, Karubian J, Sork VL. 2012. Use of α, β, and γ diversity measures to characterize seed dispersal by animals. American Naturalist 180: 719-732supplementdata.

Scofield DG, Alfaro VR, Sork VL, Grivet D, Martinez E, Papp J, Pluess AR, Koenig WD, Smouse PE. 2011. Foraging patterns of acorn woodpeckers (Melanerpes formicivorus) on valley oak (Quercus lobata Née) in two California oak savanna-woodlands. Oecologia 166: 187-196supplement.

Scofield DG, Sork VL, Smouse PE. 2010. Influence of acorn woodpecker social behaviour on transport of coast live oak (Quercus agrifolia) acorns in a southern California oak savanna. Journal of Ecology 98: 561-572supplement.

Grivet D, Smouse PE, Sork VL. 2005. A novel approach to an old problem: tracking dispersed seeds.Molecular Ecology 14: 3585-3595.

Nielsen R, Tarpy DR, Reeve HK. 2003. Estimating effective paternity number in social insects and the effective number of alleles in a population. Molecular Ecology 12: 3157-3164.