Skip to contents

Identify aggregated locations among uncertain observations using a three-step procedure:

  • Step 1: Select uncertain locations with at least a specified number (or proportion) of uncertain neighbors among their \(n\_neighbors\) nearest neighbors.

  • Step 2: Expand the aggregation set by adding uncertain neighbors of the Step 1 aggregation locations.

  • Step 3: Build an igraph connectivity graph on the Step 2 aggregation locations and remove small connected components.

Usage

detect_aggregation(
  dat_loc,
  is_uncertain,
  n_neighbors,
  threshold_count = NULL,
  threshold_prop = NULL,
  add_high_uncertain_step1 = FALSE,
  high_uncertain = NULL,
  do_step2 = TRUE,
  distance_threshold = NULL,
  min_component_size = NULL
)

Arguments

dat_loc

A numeric matrix of dimension \(n \times \rho\), where \(n\) is the number of observations and \(\rho\) is the spatial coordinate dimension (for example, \(\rho = 2\) for 2D coordinates). Each row corresponds to one observation.

is_uncertain

A logical vector of length \(n\) indicating whether each observation is uncertain.

n_neighbors

An integer specifying the number of nearest neighbors.

threshold_count

An integer specifying the minimum number of uncertain neighbors required in Step 1. If provided, this takes precedence over threshold_prop.

threshold_prop

A numeric value between 0 and 1 specifying the minimum proportion of uncertain neighbors required in Step 1. Used only when threshold_count = NULL.

add_high_uncertain_step1

A logical value indicating whether to forcibly add highly uncertain locations in Step 1.

high_uncertain

Either a logical vector of length \(n\) or an integer vector of indices specifying highly uncertain locations to be added in Step 1 when add_high_uncertain_step1 = TRUE.

do_step2

A logical value indicating whether to expand aggregation locations by adding uncertain neighbors of the Step 1 aggregation locations.

distance_threshold

A positive numeric value specifying the distance threshold used to define graph connectivity in Step 3. If NULL, it is automatically set to 1.1 times the median nearest-neighbor distance.

min_component_size

An integer specifying the minimum connected component size to keep in Step 3. If NULL, it is automatically set to floor(0.003 * nrow(dat_loc)).

Value

A list containing:

  • loc_aggre: Indices of final aggregation locations.

  • is_aggre: A logical vector of length \(n\) indicating uncertain aggregation status.

  • aggre_cluster: An integer vector of length \(n\) giving reordered igraph component labels for final aggregation locations, and NA otherwise.

  • loc_uncertain: Indices of uncertain locations.

  • loc_certain: Indices of certain locations.

  • n_aggre: Number of final aggregation locations.

  • prop_aggre_in_uncertain: Proportion of final aggregation locations among uncertain locations.

Details

Optionally, highly uncertain locations (for example, locations with entropy greater than mean + 1.5 sd) can be forcibly added in Step 1.