Narrowing Historical Uncertainty: Probabilistic Classification of Ambiguously Identified Tree Species in Historical Forest Survey Data

David J. Mladenoff, University of Wisconsin–Madison
Sally E. Dahir, University of Wisconsin–Madison
Eric V. Nordheim, University of Wisconsin–Madison
Lisa A. Schulte, University of Wisconsin–Madison
Glenn R. Guntenspergen, United States Geological Survey

This article is from Ecosystems 5 (2002): 539, doi:10.1007/s10021-002-0167-8.


Historical data have increasingly become appreciated for insight into the past conditions of ecosystems. Uses of such data include assessing the extent of ecosystem change; deriving ecological baselines for management, restoration, and modeling; and assessing the importance of past conditions on the composition and function of current systems. One historical data set of this type is the Public Land Survey (PLS) of the United States General Land Office, which contains data on multiple tree species, sizes, and distances recorded at each survey point, located at half-mile (0.8-km) intervals on a 1-mi (1.6 km) grid. This survey method was begun in the 1790s on US federal lands extending westward from Ohio. Thus, the data have the potential of providing a view of much of the US landscape from the mid-1800s, and they have been used extensively for this purpose. However, historical data sources, such as those describing the species composition of forests, can often be limited in the detail recorded and the reliability of the data, since the information was often not originally recorded for ecological purposes. Forest trees are sometimes recorded ambiguously, using generic or obscure common names. For the PLS data of northern Wisconsin, USA, we developed a method to classify ambiguously identified tree species using logistic regression analysis, using data on trees that were clearly identified to species and a set of independent predictor variables to build the models. The models were first created on partial data sets for each species and then tested for fit against the remaining data. Validations were conducted using repeated, random subsets of the data. Model prediction accuracy ranged from 81% to 96% in differentiating congeneric species among oak, pine, ash, maple, birch, and elm. Major predictor variables were tree size, associated species, landscape classes indicative of soil type, and spatial location within the study region. Results help to clarify ambiguities formerly present in maps of historic ecosystems for the region and can be applied to PLS datasets elsewhere, as well as other sources of ambiguous historical data. Mapping the newly classified data with ecological land units provides additional information on the distribution, abundance, and associations of tree species, as well as their relationships to environmental gradients before the industrial period, and clarifies the identities of species formerly mapped only to genus. We offer some caveats on the appropriate use of data derived in this way, as well as describing their potential.