Outlier detection using a Median Average Deviation "Hampel" filter. This function applies a rolling Hampel filter to find those points that are very far out in the tails of the distribution of values within the window.
The thresholdMin
level is similar to a sigma value for normally
distributed data. The default threshold setting thresholdMin = 8
identifies points that are extremely unlikely to be part of a normal
distribution and therefore very likely to be an outlier. By choosing a
relatively large value for `thresholdMin`` we make it less likely that we
will generate false positives.
The default setting of the window size windowSize = 15
means that 15 samples
from a single channel are used to determine the distribution of values for
which a median is calculated. Each PurpleAir channel makes a measurement
approximately every 120 seconds so the temporal window is 15 * 120 sec or
approximately 30 minutes. This seems like a reasonable period of time over
which to evaluate PM2.5 measurements.
Specifying replace = TRUE
allows you to perform smoothing by
replacing outliers with the window median value. Using this technique, you
can create an highly smoothed, artificial dataset by setting
thresholdMin = 1
or lower (but always above zero).
pat_outliers( pat = NULL, windowSize = 15, thresholdMin = 8, replace = FALSE, showPlot = TRUE, data_shape = 18, data_size = 1, data_color = "black", data_alpha = 0.5, outlier_shape = 8, outlier_size = 1, outlier_color = "red", outlier_alpha = 1 )
pat | PurpleAir Timeseries pat object. |
---|---|
windowSize | Integer window size for outlier detection. |
thresholdMin | Threshold value for outlier detection. |
replace | Logical specifying whether replace outliers with the window median value. |
showPlot | Logical specifying whether to generate outlier detection plots. |
data_shape | Symbol to use for data points. |
data_size | Size of data points. |
data_color | Color of data points. |
data_alpha | Opacity of data points. |
outlier_shape | Symbol to use for outlier points. |
outlier_size | Size of outlier points. |
outlier_color | Color of outlier points. |
outlier_alpha | Opacity of outlier points. |
A pat object with outliers replaced by median values.
Additional documentation on the algorithm is available in
seismicRoll::findOutliers()
.
library(AirSensor) example_pat %>% pat_filterDate(20180801, 20180815) %>% pat_outliers(replace = TRUE, showPlot = TRUE)