Data Preprocessing

This R tutorial will guide you through the modeling of a total of 20 species with spatialMaxent. We use data from the National Center for Ecological Analysis and Synthesis (NCEAS) dataset for the Canada region (Elith et al. 2020). As you have decided to start the tutorial with the data preprocessing we will guide you through the hole process from deriving the data to model evaluation.

Check if all necessary R packages are installed

This tutorial was created with R version 4.1.2 and requires the following packages. Please make sure you have them installed before we start:

Create directory structure

For the tutorial to run smoothly, it is recommended that you use the same folder structure as shown here. Start by creating an empty folder, and set it as the working directory in R. In this folder we will now create the following directory structure:

src
└─ functions
data
└─ samples
└─ output    
└─ layers
└─ background

You can set up the working environment with the following script:

The NCEAS data

Here we give a short introduction on what the NCEAS data are. The default parameters of Maxent were determined by modeling 225 species in a total of six regions of the world (Phillips & Dudík, 2008). This National Center for Ecological Analysis and Synthesis (NCEAS) data has recently been published as an open benchmark dataset that was explicitly assembled to compare species distributiom modeling methods (Elith et al., 2020). It contains six regions of the world: Australian Wet Tropics (AWT), Ontario Canada (CAN), New South Wales (NSW), New Zealand (NZ), South American countries (SA) and Switzerland (SWI). The species themselves are anonymized and only assigned to a biological group. The data consists of presence-only (PO) records, presence-absence (PA) records, background points (BP) and environmental predictors in the form of environmental layers for each of the regions. The PO and BP data are intended to train the SDM models, and the PA data to evaluate them. For a detailed description of the NCEAS dataset see Elith et al. (2020).

In this tutorial we will use the data for the region Ontario in Canada (see map below).

Full screen version of the map

Download environmental layers

The environmental layers will be used similar as they are provided in the NCEAS dataset. Go to osf and download the environmental layers for the region canada into the data/layers folder. We will just change the data format, as Maxent and spatialMaxent are not working with tif files we will convert each environmental raster layer to the ascii format.