Identify aggregated locations among uncertain observations using a three-step procedure:
Step 1: Select uncertain locations with at least a specified number (or proportion) of uncertain neighbors among their \(n\_neighbors\) nearest neighbors.
Step 2: Expand the aggregation set by adding uncertain neighbors of the Step 1 aggregation locations.
Step 3: Build an igraph connectivity graph on the Step 2 aggregation locations and remove small connected components.
Usage
detect_aggregation(
dat_loc,
is_uncertain,
n_neighbors,
threshold_count = NULL,
threshold_prop = NULL,
add_high_uncertain_step1 = FALSE,
high_uncertain = NULL,
do_step2 = TRUE,
distance_threshold = NULL,
min_component_size = NULL
)Arguments
- dat_loc
A numeric matrix of dimension \(n \times \rho\), where \(n\) is the number of observations and \(\rho\) is the spatial coordinate dimension (for example, \(\rho = 2\) for 2D coordinates). Each row corresponds to one observation.
- is_uncertain
A logical vector of length \(n\) indicating whether each observation is uncertain.
- n_neighbors
An integer specifying the number of nearest neighbors.
- threshold_count
An integer specifying the minimum number of uncertain neighbors required in Step 1. If provided, this takes precedence over
threshold_prop.- threshold_prop
A numeric value between 0 and 1 specifying the minimum proportion of uncertain neighbors required in Step 1. Used only when
threshold_count = NULL.- add_high_uncertain_step1
A logical value indicating whether to forcibly add highly uncertain locations in Step 1.
- high_uncertain
Either a logical vector of length \(n\) or an integer vector of indices specifying highly uncertain locations to be added in Step 1 when
add_high_uncertain_step1 = TRUE.- do_step2
A logical value indicating whether to expand aggregation locations by adding uncertain neighbors of the Step 1 aggregation locations.
- distance_threshold
A positive numeric value specifying the distance threshold used to define graph connectivity in Step 3. If
NULL, it is automatically set to 1.1 times the median nearest-neighbor distance.- min_component_size
An integer specifying the minimum connected component size to keep in Step 3. If
NULL, it is automatically set tofloor(0.003 * nrow(dat_loc)).
Value
A list containing:
loc_aggre: Indices of final aggregation locations.
is_aggre: A logical vector of length \(n\) indicating uncertain aggregation status.
aggre_cluster: An integer vector of length \(n\) giving reordered igraph component labels for final aggregation locations, and
NAotherwise.loc_uncertain: Indices of uncertain locations.
loc_certain: Indices of certain locations.
n_aggre: Number of final aggregation locations.
prop_aggre_in_uncertain: Proportion of final aggregation locations among uncertain locations.