Analysis of Environmental Data

Adelaide, 29 Jan - 10 Feb, 2007

Analysis of Environmental Data is an intensive 2-week course that will provide participants with a training in the theory and application of statistical techniques useful for the analysis of environmental data.

Course content

1. Data management, preparation and exploratory data analysis

  • Using databases to manage data The R statistical package
  • Data transformation
  • Graphical exploration of data
  • Data screening and outliers

2. Introduction to statistical modeling and regression analysis

  • Experimental design observational studies vs. designed data collection
  • Hypothesis testing & randomization tests
  • Correlation & regression
  • Multiple regression
  • Analysis of variance
  • Generalized linear models and logistic regression

3. Multivariate methods

  • Measures of similarity / dissimilarity
  • Indirect ordination methods: Principal components analysis, (detrended) correspondence analysis and non-metric multidimensional scaling
  • Direct (constrained) ordination methods: Redundancy analysis and canonical correspondence analysis
  • Cluster analysis: Agglomerative methods, k-means clustering,  Discriminant analysis
  • Classification and regression trees
  • Palaeoecological transfer functions
  • Analysis of temporally ordered environmental data

A provisional lecture list and timetable for the course can be found here.

Each topic will be presented using a 1-hour lecture and 2-hour practical. The lecture will introduce the theory of each set of methods and models, discuss their assumptions, and give students the knowledge to enable them to identify the type of model appropriate for a particular data analytical problem. The following practical will reinforce the understanding of the lecture material by giving the student the opportunity to learn by example and apply the techniques to datasets to answer real environmental questions.

Logistics

There will be a lecture and practical each weekday morning and afternoon, and on at least one of the weekend days, depending on the enthusiasm of the course takers and my stamina! Other time, including evenings, is available for working on the open ended projects and your own data - you are particularly encouraged to bring your own data to discuss and work on during the course. Last year most participants worked on their own data well into the night!

The course is directed towards advanced undergraduate, graduate, and working professionals in in the environmental sciences. Prereq: an undergraduate course in statistics, understanding of basic concepts such as correlation and regression, and familiarity with PC-based software for data analysis.

You will get training in a number of software packages during the course, including CANOCO, TWINSPAN and C2, although for most of the course we will use the R statistical package. R is an extremely powerful statistical and graphical computing environment that is freely available under the General Public License (GPL). R can be downloaded from http://cran.r-project.org/index.html: versions are available for Windows, Unix and Mac OS. You are strongly encouraged to invest in the book by Peter Dalgaard listed below and to familiarize yourself with the program before the course. In due course I will also post a tutorial for participants to complete before the course so we can "hit the ground running".  R does not have the bells and whistles and menu-driven commands of some expensive commercial packages. Because of this some find that R has an initial steep learning curve - but do not be put off the course will take you through the main features of the program step-by-step and demonstrate that its flexibility, extensive list of statistical methods, and powerful publication-quality graphics will amply repay time spent learning it.

Suggested reading

Quinn, G. & Keough, M. (2002) Experimental Design and Data Analysis for Biologists. Cambridge University Press. OR Gotelli, N. and A. Ellison (2004). A Primer of Ecological Statistics. Sinauer Associates. 

Dalgaard, P. (2002) Introductory Statistics with R. Springer, New York. - Excellent introduciton

Leps, J. & Smilauer, P. (2003) Multivariate Analysis of Ecological Data using CANOCO. Cambridge University Press, Cambridge.

Shaw, J. (2003) Multivariate Statistics for the Environmental Sciences. Arnold, London.

Top | Home  | Contact

Last updated on: 12 Jul 2013  Copyright 2013 Steve Juggins