Title: | Hot-Spot Analysis with Simple Features |
---|---|
Description: | Identify and understand clusters of points (typically representing the locations of places or events) stored in simple-features (SF) objects. This is useful for analysing, for example, hot-spots of crime events. The package emphasises producing results from point SF data in a single step using reasonable default values for all other arguments, to aid rapid data analysis by users who are starting out. Functions available include kernel density estimation (for details, see Yip (2020) <doi:10.22224/gistbok/2020.1.12>), analysis of spatial association (Getis and Ord (1992) <doi:10.1111/j.1538-4632.1992.tb00261.x>) and hot-spot classification (Chainey (2020) ISBN:158948584X). |
Authors: | Matt Ashby [aut, cre] |
Maintainer: | Matt Ashby <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.9.0 |
Built: | 2025-02-10 19:21:53 UTC |
Source: | https://github.com/mpjashby/sfhotspot |
Plot the output produced by hotspot_classify
with reasonable
default values.
## S3 method for class 'hspt_c' autoplot(object, ...)
## S3 method for class 'hspt_c' autoplot(object, ...)
object |
An object with the class |
... |
Currently ignored, but may be used for further options in future. |
A ggplot
object.
This function returns a ggplot
object, meaning you can further control
the appearance of the plot by adding calls to further ggplot2
functions.
Plot the output produced by hotspot_change
with reasonable
default values.
## S3 method for class 'hspt_d' autoplot(object, ...) ## S3 method for class 'hspt_d' autolayer(object, ...)
## S3 method for class 'hspt_d' autoplot(object, ...) ## S3 method for class 'hspt_d' autolayer(object, ...)
object |
An object with the class |
... |
Currently ignored, but may be used for further options in future. |
A ggplot
object.
This function returns a ggplot
object, meaning you can further control
the appearance of the plot by adding calls to further ggplot2
functions.
autolayer(hspt_d)
: Create a ggplot layer of change in grid counts
Plot the output produced by hotspot_kde
with reasonable
default values.
## S3 method for class 'hspt_k' autoplot(object, ...) ## S3 method for class 'hspt_k' autolayer(object, ...)
## S3 method for class 'hspt_k' autoplot(object, ...) ## S3 method for class 'hspt_k' autolayer(object, ...)
object |
An object with the class |
... |
further arguments passed to |
A ggplot
object or layer that can be used as
part of a ggplot
stack.
autoplot
returns a ggplot
object, meaning you can further
control the appearance of the plot by adding calls to further ggplot2
functions.
autolayer(hspt_k)
: Create a ggplot layer of kernel-density values
Plot the output produced by hotspot_count
with reasonable
default values.
## S3 method for class 'hspt_n' autoplot(object, ...) ## S3 method for class 'hspt_n' autolayer(object, ...)
## S3 method for class 'hspt_n' autoplot(object, ...) ## S3 method for class 'hspt_n' autolayer(object, ...)
object |
An object with the class |
... |
further arguments passed to |
A ggplot
object or layer that can be used as
part of a ggplot
stack.
autoplot
returns a ggplot
object, meaning you can further
control the appearance of the plot by adding calls to further ggplot2
functions.
autolayer(hspt_n)
: Create a ggplot layer of grid counts
Identify change in the number of points (typically representing events) between two periods (before and after a specified date) or in two groups (e.g. on weekdays or at weekends).
hotspot_change( data, time = NULL, boundary = NULL, groups = NULL, cell_size = NULL, grid_type = "rect", grid = NULL, quiet = FALSE )
hotspot_change( data, time = NULL, boundary = NULL, groups = NULL, cell_size = NULL, grid_type = "rect", grid = NULL, quiet = FALSE )
data |
|
time |
Name of the column in |
boundary |
A single |
groups |
Name of a column in |
cell_size |
|
grid_type |
|
grid |
|
quiet |
if set to |
This function creates a regular two-dimensional grid of cells (unless a
custom grid is specified with grid
) and calculates the difference
between the number of points in each grid cell:
before and after a set point in time, if boundary
is specified,
between two groups of points, if a column of grouping values is specified
with groups
,
before and after the mid-point of the dates/times present in the data, if
both boundary
and groups
are NULL
(the default).
If both boundary
and groups
are not NULL
, the value of
boundary
will be ignored.
The grid produced by this function covers the convex hull of the input data
layer. This means the result may include zero counts for cells that are
outside the area for which data were provided, which could be misleading. To
handle this, consider cropping the output layer to the area for which data
are available. For example, if you only have crime data for a particular
district, crop the output dataset to the district boundary using
st_intersection
.
If no cell size is given then the cell size will be set so that there are 50
cells on the shorter side of the grid. If the data
SF object is
projected in metres or feet, the number of cells will be adjusted upwards so
that the cell size is a multiple of 100.
An sf
tibble of regular grid cells with
corresponding hot-spot classifications for each cell. This can be plotted
using autoplot
.
hotspot_dual_kde()
for comparing the density of two layers, which
will often be more useful than comparing counts if the point locations
represent and underlying continuous distribution.
# Compare counts from the first half of the period covered by the data to # counts from the second half hotspot_change(memphis_robberies) # Create a grouping variable, then compare counts across values of that # variable memphis_robberies$weekend <- weekdays(memphis_robberies$date) %in% c("Saturday", "Sunday") hotspot_change(memphis_robberies, groups = weekend)
# Compare counts from the first half of the period covered by the data to # counts from the second half hotspot_change(memphis_robberies) # Create a grouping variable, then compare counts across values of that # variable memphis_robberies$weekend <- weekdays(memphis_robberies$date) %in% c("Saturday", "Sunday") hotspot_change(memphis_robberies, groups = weekend)
Classify cells in a grid based on changes in the clustering of points (typically representing events) in a two-dimensional regular grid over time.
hotspot_classify( data, time = NULL, period = NULL, start = NULL, cell_size = NULL, grid_type = "rect", grid = NULL, collapse = FALSE, params = hotspot_classify_params(), quiet = FALSE )
hotspot_classify( data, time = NULL, period = NULL, start = NULL, cell_size = NULL, grid_type = "rect", grid = NULL, collapse = FALSE, params = hotspot_classify_params(), quiet = FALSE )
data |
|
time |
Name of the column in |
period |
A character value containing a number followed by a unit of time, e.g. for example, "12 months" or "3.5 days", where the unit of time is one of second, minute, hour, day, week, month, quarter or year (or their plural forms). |
start |
A |
cell_size |
|
grid_type |
|
grid |
|
collapse |
If the range of dates in the data is not a multiple of
|
params |
A list of optional parameters that can affect the output. The
list can be produced most easily using the
|
quiet |
if set to |
An sf
tibble of regular grid cells with
corresponding hot-spot classifications for each cell. This can be plotted
using autoplot
.
Hot-spots are spatial areas that contain more points than would be expected by chance; cold-spots are areas that contain fewer points than would be expected. Whether an area is a hot-spot can vary over time. This function creates a space-time cube, determines whether an area is a hot-spot for each of several consecutive time periods and uses that to classify areas according to whether they are persistent, intermittent, emerging or former hot- or cold-spots.
Hot- and cold-spots are identified by calculating the Getis-Ord
Gi*
(gi-star) or
Gi*-score statistic for each cell in a regular grid for each time period.
Cells are classified as follows, using the parameters provided in the
params
argument:
Persistent hot-/cold-spots are cells that have been hot-/cold-spots
consistently over time. Formally: if the p-value is less than
critical_p
for at least persistent_prop
proportion of time periods.
Emerging hot-/cold-spots are cells that have become hot-/cold-spots
recently but were not previously. Formally: if the p-value is less
than critical_p
for at least hotspot_prop
of time periods defined as
recent by recent_prop
but the p-value was not less than
critical_p
for at least hotspot_prop
of time periods defined as
non-recent by 1 - recent_prop
.
Former hot-/cold-spots are cells that used to be hot-/cold-spots but have
not been more recently. Formally: if the p-value was less than
critical_p
for at least hotspot_prop
of time periods defined as
non-recent by 1 - recent_prop
but the p-value was not less than
critical_p
for for at least hotspot_prop
of time periods defined as
recent by recent_prop
.
Intermittent hot-/cold-spots are cells that have been hot-/cold-spots,
but not as frequently as persistent hotspots and not only during
recent/non-recent periods. Formally: if the p-value is less than
critical_p
for at least hotspot_prop
of time periods but the cell is
not an emerging or former hotspot.
No pattern if none of the above categories apply.
The grid produced by this function covers the convex hull of the input data
layer. This means the result may include
Gi* or
Gi*
values for cells that are outside the area for which data were provided,
which could be misleading. To handle this, consider cropping the output layer
to the area for which data are available. For example, if you only have crime
data for a particular district, crop the output dataset to the district
boundary using st_intersection
.
If no cell size is given then the cell size will be set so that there are 50
cells on the shorter side of the grid. If the data
SF object is projected
in metres or feet, the number of cells will be adjusted upwards so that the
cell size is a multiple of 100.
Chainey, S. (2020). Understanding Crime: Analyzing the Geography of Crime. Redlands, CA: ESRI.
This function allows specification of parameters that affect the output from
hotspot_classify
.
hotspot_classify_params( hotspot_prop = 0.1, persistent_prop = 0.8, recent_prop = 0.2, critical_p = 0.05, nb_dist = NULL, include_self = TRUE, p_adjust_method = NULL )
hotspot_classify_params( hotspot_prop = 0.1, persistent_prop = 0.8, recent_prop = 0.2, critical_p = 0.05, nb_dist = NULL, include_self = TRUE, p_adjust_method = NULL )
hotspot_prop |
A single numeric value specifying the minimum proportion of periods for which a cell must contain significant clusters of points before the cell can be classified as a hot or cold spot of any type. |
persistent_prop |
A single numeric value specifying the minimum proportion of periods for which a cell must contain significant clusters of points before the cell can be classified as a persistent hot or cold spot. |
recent_prop |
A single numeric value specifying the proportion of periods that should be treated as being recent in the classification of emerging and former hotspots. |
critical_p |
A threshold p-value below which values should be treated as being statistically significant. |
nb_dist |
The distance around a cell that contains the neighbours of
that cell, which are used in calculating the statistic. If this argument is
|
include_self |
Should points in a given cell be counted as well as
counts in neighbouring cells when calculating the values of
Gi*
(if |
p_adjust_method |
The method to be used to adjust p-values for
multiple comparisons. |
A list that can be used as the input to the params
argument to
hotspot_classify
.
Count points in cells in a two-dimensional grid
hotspot_count( data, cell_size = NULL, grid_type = "rect", grid = NULL, weights = NULL, quiet = FALSE )
hotspot_count( data, cell_size = NULL, grid_type = "rect", grid = NULL, weights = NULL, quiet = FALSE )
data |
|
cell_size |
|
grid_type |
|
grid |
|
weights |
|
quiet |
if set to |
This function counts the number of points in each cell in a regular grid. If
a column name in data
is supplied with the weights
argument,
weighted counts will also be produced.
If grid
is NULL
and no cell size is given, the cell size will be set so
that there are 50 cells on the shorter side of the grid. If the data
SF
object is projected in metres or feet, the number of cells will be adjusted
upwards so that the cell size is a multiple of 100.
An sf
tibble of regular grid cells with
corresponding point counts for each cell. This can be plotted using
autoplot
.
# Set cell size automatically hotspot_count(memphis_robberies_jan) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres library(sf) memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Manually set grid-cell size in metres, since the `memphis_robberies_utm` # dataset uses a co-ordinate reference system (UTM zone 15 north) that is # specified in metres hotspot_count(memphis_robberies_utm, cell_size = 200)
# Set cell size automatically hotspot_count(memphis_robberies_jan) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres library(sf) memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Manually set grid-cell size in metres, since the `memphis_robberies_utm` # dataset uses a co-ordinate reference system (UTM zone 15 north) that is # specified in metres hotspot_count(memphis_robberies_utm, cell_size = 200)
Estimate the relationship between the kernel density of two layers of points
hotspot_dual_kde( x, y, cell_size = NULL, grid_type = "rect", bandwidth = NULL, bandwidth_adjust = 1, method = "ratio", grid = NULL, weights = NULL, quiet = FALSE, ... )
hotspot_dual_kde( x, y, cell_size = NULL, grid_type = "rect", bandwidth = NULL, bandwidth_adjust = 1, method = "ratio", grid = NULL, weights = NULL, quiet = FALSE, ... )
x , y
|
|
cell_size |
|
grid_type |
|
bandwidth |
either a single |
bandwidth_adjust |
single positive |
method |
The result of this calculation will be returned in the |
grid |
|
weights |
|
quiet |
if set to |
... |
Further arguments passed to |
An sf
tibble of grid cells with corresponding point
counts and dual kernel density estimates for each cell. This can be plotted
using autoplot
.
This function creates a regular two-dimensional grid of cells (unless a
custom grid is specified with grid
), calculates the density of points
in each cell for each of x
and y
using functions from the
SpatialKDE
package, then produces a value representing a relation
between the two densities. The count of points in each cell is also returned.
Dual kernel density values can be useful for understanding the relationship between the distributions of two sets of point locations. For example:
The ratio between two densities representing the locations of burglaries and the locations of houses can show the distribution of the risk (incidence rate) of burglaries. The logged ratio may be useful to show relationships where one set of points has an extremely skewed distribution.
The difference between two densities can show the change in distributions between two points in time.
The sum of two densities can be used to estimate the total density of two types of point, e.g. the locations of occurrences of two diseases.
The grid produced by this function covers the convex hull of the points in
x
. This means the result may include KDE values for cells that are
outside the area for which data were provided, which could be misleading. To
handle this, consider cropping the output layer to the area for which data
are available. For example, if you only have crime data for a particular
district, crop the output dataset to the district boundary using
st_intersection
.
If no cell size is given then the cell size will be set so that there are 50
cells on the shorter side of the grid. If the x
SF object is projected
in metres or feet, the number of cells will be adjusted upwards so that the
cell size is a multiple of 100.
Yin, P. (2020). Kernels and Density Estimation. The Geographic Information Science & Technology Body of Knowledge (1st Quarter 2020 Edition), John P. Wilson (ed.). doi:doi:10.22224/gistbok/2020.1.12
# See also the examples for `hotspot_kde()` for examples of how to specify # `cell_size`, `bandwidth`, etc. library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies, 32615) memphis_population_utm <- st_transform(memphis_population, 32615) # Calculate burglary risk based on residential population. `weights` is set # to `c(NULL, population)` so that the robberies layer is not weighted and # the population layer is weighted according to the number of residents in # each census block. hotspot_dual_kde( memphis_robberies_utm, memphis_population_utm, bandwidth = list(NULL, NULL), weights = c(NULL, population) )
# See also the examples for `hotspot_kde()` for examples of how to specify # `cell_size`, `bandwidth`, etc. library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies, 32615) memphis_population_utm <- st_transform(memphis_population, 32615) # Calculate burglary risk based on residential population. `weights` is set # to `c(NULL, population)` so that the robberies layer is not weighted and # the population layer is weighted according to the number of residents in # each census block. hotspot_dual_kde( memphis_robberies_utm, memphis_population_utm, bandwidth = list(NULL, NULL), weights = c(NULL, population) )
Identify hotspot and coldspot locations, that is cells in a regular grid in which there are more/fewer points than would be expected if the points were distributed randomly.
hotspot_gistar( data, cell_size = NULL, grid_type = "rect", kde = TRUE, bandwidth = NULL, bandwidth_adjust = 1, grid = NULL, weights = NULL, nb_dist = NULL, include_self = TRUE, p_adjust_method = NULL, quiet = FALSE, ... )
hotspot_gistar( data, cell_size = NULL, grid_type = "rect", kde = TRUE, bandwidth = NULL, bandwidth_adjust = 1, grid = NULL, weights = NULL, nb_dist = NULL, include_self = TRUE, p_adjust_method = NULL, quiet = FALSE, ... )
data |
|
cell_size |
|
grid_type |
|
kde |
|
bandwidth |
|
bandwidth_adjust |
single positive |
grid |
|
weights |
|
nb_dist |
The distance around a cell that contains the neighbours of
that cell, which are used in calculating the statistic. If this argument is
|
include_self |
Should points in a given cell be counted as well as
counts in neighbouring cells when calculating the values of
Gi*
(if |
p_adjust_method |
The method to be used to adjust p-values for
multiple comparisons. |
quiet |
if set to |
... |
Further arguments passed to |
This function calculates the Getis-Ord
Gi*
(gi-star) or
Gi*-score statistic for identifying clusters of point locations. The
underlying implementation uses the
localG
function to
calculate the scores and then
p.adjustSP
function to adjust the corresponding -values for multiple comparison.
The function also returns counts of points in each cell and (by default but
optionally) kernel density estimates using the
kde
function.
The grid produced by this function covers the convex hull of the input data
layer. This means the result may include
Gi* or
Gi*
values for cells that are outside the area for which data were provided,
which could be misleading. To handle this, consider cropping the output layer
to the area for which data are available. For example, if you only have crime
data for a particular district, crop the output dataset to the district
boundary using st_intersection
.
If no cell size is given then the cell size will be set so that there are 50
cells on the shorter side of the grid. If the data
SF object is projected
in metres or feet, the number of cells will be adjusted upwards so that the
cell size is a multiple of 100.
An sf
tibble of regular grid cells with
corresponding point counts,
Gi* or
Gi*
values and (optionally) kernel density estimates for each cell. Values
greater than zero indicate more points than would be expected for randomly
distributed points and values less than zero indicate fewer points.
Critical values of
Gi* and
Gi*
are given in the manual page for localG
.
The output from this function can be plotted in the same way as for other
SF objects, for which see vignette("sf5", package = "sf")
.
Getis, A. & Ord, J. K. (1992). The Analysis of Spatial Association by Use of Distance Statistics. Geographical Analysis, 24(3), 189-206. doi:doi:10.1111/j.1538-4632.1992.tb00261.x
library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Automatically set grid-cell size, bandwidth and neighbour distance hotspot_gistar(memphis_robberies_utm) # Manually set grid-cell size in metres, since the `memphis_robberies` # dataset uses a co-ordinate reference system (UTM zone 15 north) that is # specified in metres hotspot_gistar(memphis_robberies_utm, cell_size = 200) # Automatically set grid-cell size and bandwidth for lon/lat data, since it # is not intuitive to set these values manually in decimal degrees. To do # this it is necessary to not calculate KDEs due to a limitation in the # underlying function. hotspot_gistar(memphis_robberies, kde = FALSE)
library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Automatically set grid-cell size, bandwidth and neighbour distance hotspot_gistar(memphis_robberies_utm) # Manually set grid-cell size in metres, since the `memphis_robberies` # dataset uses a co-ordinate reference system (UTM zone 15 north) that is # specified in metres hotspot_gistar(memphis_robberies_utm, cell_size = 200) # Automatically set grid-cell size and bandwidth for lon/lat data, since it # is not intuitive to set these values manually in decimal degrees. To do # this it is necessary to not calculate KDEs due to a limitation in the # underlying function. hotspot_gistar(memphis_robberies, kde = FALSE)
Create either a rectangular or hexagonal two-dimensional grid
hotspot_grid(data, cell_size = NULL, grid_type = "rect", quiet = FALSE, ...)
hotspot_grid(data, cell_size = NULL, grid_type = "rect", quiet = FALSE, ...)
data |
|
cell_size |
|
grid_type |
|
quiet |
if set to |
... |
Further arguments passed to |
A simple features tibble containing polygons representing grid cells.
The grid will be based on the convex hull of data
, expanded by a
buffer of cell_size / 2
to ensure all the points in data
fall
within the resulting grid.
Estimate two-dimensional kernel density of points
hotspot_kde( data, cell_size = NULL, grid_type = "rect", bandwidth = NULL, bandwidth_adjust = 1, grid = NULL, weights = NULL, quiet = FALSE, ... )
hotspot_kde( data, cell_size = NULL, grid_type = "rect", bandwidth = NULL, bandwidth_adjust = 1, grid = NULL, weights = NULL, quiet = FALSE, ... )
data |
|
cell_size |
|
grid_type |
|
bandwidth |
|
bandwidth_adjust |
single positive |
grid |
|
weights |
|
quiet |
if set to |
... |
Further arguments passed to |
This function creates a regular two-dimensional grid of cells (unless a
custom grid is specified with grid
) and calculates the density of
points in each cell on that grid using functions from the SpatialKDE
package. The count of points in each cell is also returned.
The grid produced by this function covers the convex hull of the input data
layer. This means the result may include KDE values for cells that are
outside the area for which data were provided, which could be misleading. To
handle this, consider cropping the output layer to the area for which data
are available. For example, if you only have crime data for a particular
district, crop the output dataset to the district boundary using
st_intersection
.
If no cell size is given then the cell size will be set so that there are 50
cells on the shorter side of the grid. If the data
SF object is projected
in metres or feet, the number of cells will be adjusted upwards so that the
cell size is a multiple of 100.
An sf
tibble of grid cells with corresponding point
counts and kernel density estimates for each cell. This can be plotted
using autoplot
.
Yin, P. (2020). Kernels and Density Estimation. The Geographic Information Science & Technology Body of Knowledge (1st Quarter 2020 Edition), John P. Wilson (ed.). doi:doi:10.22224/gistbok/2020.1.12
library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Automatically set grid-cell size, bandwidth and neighbour distance hotspot_kde(memphis_robberies_utm) # Manually set grid-cell size and bandwidth in metres, since the # `memphis_robberies_utm` dataset uses a co-ordinate reference system (UTM # zone 15 north) that is specified in metres hotspot_kde(memphis_robberies_utm, cell_size = 200, bandwidth = 1000)
library(sf) # Transform data to UTM zone 15N so that cell_size and bandwidth can be set # in metres memphis_robberies_utm <- st_transform(memphis_robberies_jan, 32615) # Automatically set grid-cell size, bandwidth and neighbour distance hotspot_kde(memphis_robberies_utm) # Manually set grid-cell size and bandwidth in metres, since the # `memphis_robberies_utm` dataset uses a co-ordinate reference system (UTM # zone 15 north) that is specified in metres hotspot_kde(memphis_robberies_utm, cell_size = 200, bandwidth = 1000)
A dataset containing records of populations associated with the centroids of census blocks in Memphis, Tennessee, in 2020.
memphis_population
memphis_population
A simple-features tibble with 10,393 rows and three variables:
the census GEOID for each block
the number of people residing in each block
the co-ordinates of the centroid of each block, stored in simple-features point format
US Census Bureau. Census 2020, Redistricting Data summary file. https://www.census.gov/programs-surveys/decennial-census/about/rdo/summary-files.html
A dataset containing the boundaries of Memphis Police Department precincts.
memphis_precincts
memphis_precincts
A simple-features tibble with 9 rows and two variables:
the precinct name
the boundary of each precinct, stored in simple-features polygon format
Licence: Public domain https://data.memphistn.gov/d/tdws-78iq
City of Memphis https://data.memphistn.gov/d/rqqz-pj4u
A dataset containing records of personal robberies recorded by police in Memphis, Tennessee, in 2019.
memphis_robberies
memphis_robberies
A simple-features tibble with 2,245 rows and four variables:
a unique identifier for each robbery
the type of crime (always 'personal robbery')
the date and time at which the crime occurred
the co-ordinates at which the crime occurred, stored in simple-features point format
Crime Open Database, https://osf.io/zyaqn/
A dataset containing records of personal robberies recorded by police in Memphis, Tennessee, in January 2019. This dataset is too small for some types of analysis but is included for testing purposes.
memphis_robberies_jan
memphis_robberies_jan
A simple-features tibble with 206 rows and four variables:
a unique identifier for each robbery
the type of crime (always 'personal robbery')
the date and time at which the crime occurred
the co-ordinates at which the crime occurred, stored in simple-features point format
Crime Open Database, https://osf.io/zyaqn/