spatsoc is an R package for detecting spatial and temporal groups in GPS relocations. It can be used to build proximity-based social networks using gambit-of-the-group format and edge-lists. In addition, the randomization function provides data-stream randomization methods suitable for GPS data.
See the other vignettes for further information:
group_pts, group_lines, group_polysedge_distgroup_times, group_pts, group_lines, group_polys,
edge_dist, edge_nn, and randomizationsedge_dist and edge_nndyad_idfusion_idget_geometry to setup a geometry column and use the geometry interfacespatsoc leverages data.table to modify by reference and iteratively work on
subsets of the input data. The first input for all functions in spatsoc is
DT, an input data.table. If your data is a data.frame, you can convert it
by reference using setDT(DF).
spatsoc is designed to work in two steps: temporal followed by either spatial
grouping or edge-list generating. Considering your specific study species and
system, determine a relevant temporal and spatial grouping threshold. This may
be 5 minutes and 50 meters or 2 days and 100 meters or any other thresholds -
the functions provided by spatsoc are flexible to user input. In some cases,
the spatial grouping function selected is only relevant with certain temporal
grouping thresholds. For example, we wouldn’t expect a threshold of 5 minutes
with group_polys.
# Load packages
library(spatsoc)
library(data.table)
# Read data as a data.table
DT <- fread(system.file("extdata", "DT.csv", package = "spatsoc"))
# Cast datetime column to POSIXct
DT[, datetime := as.POSIXct(datetime)]
# Temporal groups
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
## ID X Y datetime population minutes timegroup
## <char> <num> <num> <POSc> <int> <int> <int>
## 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
## 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
## 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
## 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
## 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
## ---
## 14293: J 700616.5 5509069 2017-02-28 14:00:54 1 0 1393
## 14294: J 700622.6 5509065 2017-02-28 16:00:11 1 0 1394
## 14295: J 700657.5 5509277 2017-02-28 18:00:55 1 0 1449
## 14296: J 700610.3 5509269 2017-02-28 20:00:48 1 0 1395
## 14297: J 700744.0 5508782 2017-02-28 22:00:39 1 0 1396
# Spatial groups
group_pts(
DT,
threshold = 50,
id = 'ID',
coords = c('X', 'Y'),
timegroup = 'timegroup'
)
## ID X Y datetime population minutes timegroup
## <char> <num> <num> <POSc> <int> <int> <int>
## 1: A 715851.4 5505340 2016-11-01 00:00:54 1 0 1
## 2: A 715822.8 5505289 2016-11-01 02:01:22 1 0 2
## 3: A 715872.9 5505252 2016-11-01 04:01:24 1 0 3
## 4: A 715820.5 5505231 2016-11-01 06:01:05 1 0 4
## 5: A 715830.6 5505227 2016-11-01 08:01:11 1 0 5
## ---
## 14293: J 700616.5 5509069 2017-02-28 14:00:54 1 0 1393
## 14294: J 700622.6 5509065 2017-02-28 16:00:11 1 0 1394
## 14295: J 700657.5 5509277 2017-02-28 18:00:55 1 0 1449
## 14296: J 700610.3 5509269 2017-02-28 20:00:48 1 0 1395
## 14297: J 700744.0 5508782 2017-02-28 22:00:39 1 0 1396
## group
## <int>
## 1: 1
## 2: 2
## 3: 3
## 4: 4
## 5: 5
## ---
## 14293: 9909
## 14294: 9910
## 14295: 9911
## 14296: 9912
## 14297: 9913
adehabitatHR (>= 0.4.21),
data.table (>= 1.15.0),
igraph,
sf,
lwgeom (>= 0.2.15),
CircStats,
stats,
units (>= 0.8-6),
rlang
See here for
help installing sf, see here
for help installing units and here for
help installing lwgeom.
group_times(DT, datetime, threshold)
DT: input data.tabledatetime: date time column name in input data.tablethreshold: threshold for groupingA data.table with a date time formatted column. The input DT will be
returned with columns appended. The timegroup column corresponds to the
temporal group assigned to each row. Please note that the actual value of the
time group is meaningless. Reordered data will return a different time group.
What is meaningful, however, is the contents of each group. Each group will
contain all rows nearest to the threshold provided.
The group_times function expects either one column (POSIXct) or two columns
(IDate and ITime).
Given a character column representing the date time, convert it to POSIXct or
IDate and ITime:
DT[, datetime := as.POSIXct(datetime)]
DT[, c('idate', 'itime') := IDateTime(datetime)]
These are then provided to the function using the names of the column in the input data.
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
or
group_times(DT, datetime = c('idate', 'itime'), threshold = '5 minutes')
The threshold provided to group_times should be related to the fix rate of
the input dataset and to the specific study system and species. If relocations
are recorded every two hours, a threshold = '2 hours' will group all rows to
the nearest two hour group (10am, 12pm, 2pm, 4pm, …). This, however, means
that the relocations can be up to one hour apart from each other. Picking a
smaller threshold, e.g.: threshold = '15 minutes' may be more relevant in some
cases. The flexibility of spatsoc’s threshold argument means the user must
carefully consider what threshold is reasonable to their specific system.
The threshold of group_times is considered only within the scope of 24 hours
and this poses limitations on it:
threshold must evenly divide into 60 minutes or 24 hoursthreshold cannot be fractionalThe main column returned by group_times is “timegroup”. It represents the
temporal group of each row, where those nearest (either above or below) within
the threshold are grouped. Its actual value does not have any meaning, but the
contents of each group do. That means if the data is reordered, a row may have a
different time group, but the other rows in that group should not change.
The extra columns are provided to help the user investigate, troubleshoot and interpret the timegroup.
| threshold unit | column(s) added |
|---|---|
| minute | “minutes” column added identifying the nearest minute group for each row. |
| hour | “hours” column added identifying the nearest hour group for each row. |
| day | “block” columns added identifying the multiday block for each row. |
This message is returned to the user when a column matching those returned by
group_times is found in the input DT. This is commonly the case when
group_times is run multiple times consecutively.
This message is returned to the user when the threshold is NULL. This is the
default setting of threshold and, at times, may be suitable. In this case, the
date times in the datetime column will be grouped exactly. Usually, a
threshold should be provided.
This warning is returned to the user when the threshold with unit days does
not divide evenly into the range of days in DT. For example, if DT had data
covering 30 days, and a threshold of ‘7 days’ was used, this warning would be
returned. Note, this warning is returned for the range of days for the entire
data set and not by year.
group_pts(DT, threshold, id, coords, timegroup, splitBy)
DT: input data.tablethreshold: threshold for groupingid: column name of IDs in DTcoords: column names of x and y coordinates in DTtimegroup: column name of time groupsplitBy: (optional) column names of extra variables to group onThe input data.table. It will returned with a column named group appended,
which represents the spatial (and temporal if timegroup is provided) group.
The threshold must be in the units of the coordinates.
group_lines(DT, threshold, crs, id, coords, timegroup, sortBy, splitBy, spLines)
DT: input data.tablethreshold: threshold for groupingcrs: crs of coordinates in DTid: column name of IDs in DTcoords: column names of x and y coordinates in DTtimegroup: (optional) column name of time groupsortBy: column name of date time to sort rows for building linessplitBy: (optional) column names of extra variables to group onsfLines: alternatively, provide a sf LINESTRING object and id column nameSee 3.2.1.
The threshold argument represents a buffer area around each line. When
threshold = 0, the lines are grouped by spatial overlap. If the threshold is
greater than 0, the lines buffered, then grouped by spatial overlap.
The crs argument expects a character string or numeric
defining the coordinate reference system to be passed to sf::st_crs.
For example, for UTM zone 36S (EPSG 32736), the crs
argument is crs = "EPSG:32736" or crs = 32736.
See https://spatialreference.org for a list of EPSG codes.
Please note, R spatial has followed updates to GDAL
and PROJ for handling coordinate reference systems, see more at
https://r-spatial.org/r/2020/03/17/wkt.html.
The sortBy argument expects a date time formatted column name, which is used
to order the rows for each individual (and splitBy).
group_polys(DT, area, hrType, hrParams, crs, id, coords, splitBy, spLines)
DT: input data.tablearea: logical argument if proportional area should be returnedhrType: type of home range createdhrParams: parameters relevant to the type of home range createdcrs: crs of coordinates in DTid: column name of IDs in DTcoords: column names of x and y coordinates in DTsplitBy: (optional) column names of extra variables to group onsfPolys: alternatively, provide a simple features POLGON or
MULTIPOLYGON object and an id columnIf area = FALSE, see 3.2.1. If area = TRUE, the DT will not be
appended with a group column instead a data.table with IDs and proportional
area overlap will be returned.
The default unit for area overlap is square meters.
The crs argument expects a character string or numeric
defining the coordinate reference system to be passed to sf::st_crs.
For example, for UTM zone 36S (EPSG 32736), the crs
argument is crs = "EPSG:32736" or crs = 32736.
See https://spatialreference.org for a list of EPSG codes.
Please note, R spatial has followed updates to GDAL
and PROJ for handling coordinate reference systems, see more at
https://r-spatial.org/r/2020/03/17/wkt.html.
Currently, spatsoc offers two types of home ranges provided by the
adehabitatHR package: ‘mcp’ (mcp) and ‘kernel’ (kernelUD and
getverticeshr). The parameters must match the arguments of those functions.
Internally, we match arguments to the functions allowing the user to provide,
for example, both the percent (provided to getverticeshr) and grid arguments
(provided to mcp).
group_polys(
DT,
area = FALSE,
crs = utm,
hrType = 'mcp',
hrParams = list(grid = 60, percent = 95),
id = 'ID',
coords = c('X', 'Y')
)
edge_dist(DT = NULL, threshold = NULL, id = NULL, coords = NULL, timegroup = NULL, splitBy = NULL, fillNA = TRUE)
DT: input data.tablethreshold: threshold for groupingid: column name of IDs in DTcoords: column names of x and y coordinates in DTtimegroup: column name of time groupsplitBy: (optional) column names of extra variables to group onfillNA: logical indicating if NAs should be returned for individuals that
were not within the threshold distance of any other. If TRUE, NAs are returned.
If FALSE, only edges between individuals within the threshold distance are
returned.This is the non-chain rule implementation similar to group_pts. Edges are
defined by the distance threshold and NAs are returned for individuals within
each timegroup if they are not within the threshold distance of any other
individual (if fillNA is TRUE).
See the vignette Using edge-list generating functions and dyad_id for details about the edge_dist function.
edge_nn(DT = NULL, id = NULL, coords = NULL, timegroup = NULL, splitBy = NULL, threshold = NULL)
DT: input data.tableid: column name of IDs in DTcoords: column names of x and y coordinates in DTtimegroup: column name of time groupsplitBy: (optional) column names of extra variables to group onthreshold: (optional) spatial distance threshold to set maximum distance
between an individual and their neighbour.This function can be used to generate edge-lists defined either by nearest neighbour or nearest neighbour with a maximum distance. NAs are returned for nearest neighbour for an individual was alone in a timegroup (and/or splitBy) or if the distance between an individual and it’s nearest neighbour is greater than the threshold.
See the vignette Using edge-list generating functions and dyad_id for details about the edge_nn function.
randomizations(DT, type, id, datetime, splitBy, iterations)
DT: input data.tabletype: one of ‘daily’, ‘step’ or ‘trajectory’id: Character string of ID column namedatetime: field used for providing date time or time group - see detailssplitBy: List of fields in DT to split the randomization process byiterations: The number of iterations to randomizeSee the vignette Using spatsoc in social network analysis for details about the randomizations function (specifically the section ‘Data stream randomization’)
Many functions in spatsoc use data.table’s modify-by-reference to reduce
recopying large datasets and improve performance. Included in this list are:
group_timesgroup_ptsgroup_linesgroup_polys(area = FALSE)centroid_groupdirection_to_leaderdirection_to_centroiddirection_stepdirection_groupdirection_polarizationdistance_to_centroiddistance_to_leaderfusion_iddyad_idleader_direction_groupFunctions that require their outputs be reassigned include:
group_polys(area = TRUE)edge_distedge_nnedge_delayedge_directionedge_alignmentcentroid_fusioncentroid_dyadleader_edge_delayrandomizationsCheck that your data.table has columns allocated (with
data.table::truelength) and if not, use data.table::setDT or
data.table::alloc.col. This can happen if you are reading your data from RDS
or RData files. See
here.
if (truelength(DT) == 0) {
setDT(DT)
}
# then go to spatsoc
group_times(DT, datetime = 'datetime', threshold = '5 minutes')
or simply:
DT <- readRDS('path/to/data.Rds')
alloc.col(DT)
Here are some useful code chunks for understanding the spatial and temporal
extent of your data and the outputs of spatsoc functions.
# Number of unique individuals
DT[, uniqueN(ID)]
# Number of unique individuals by timegroup
DT[, uniqueN(ID), by = timegroup]
# Min, max datetime
DT[, range(datetime)]
# Difference between relocations in hours
DT[order(datetime),
.(difHours = as.numeric(difftime(datetime, shift(datetime), units = 'hours'))),
by = ID]
# Difference between relocations in hours
DT[order(datetime),
.(difMins = as.numeric(difftime(datetime, shift(datetime), units = 'mins'))),
by = ID]
Simple spatial extents can be calculated for all individuals or by individual.
# All individuals
DT[, .(minX = min(X),
maxX = max(X),
minY = min(Y),
maxY = max(Y),)]
# By individual
DT[, .(minX = min(X),
maxX = max(X),
minY = min(Y),
maxY = max(Y),),
by = ID]
spatsoc outputsAfter using the grouping functions, we can determine the number of individuals in a temporal or spatial group.
# Number of unique individuals by timegroup
DT[, uniqueN(ID), by = timegroup]
# Number of unique individuals by group
DT[, uniqueN(ID), by = group]