vignettes/pas_introduction.Rmd
pas_introduction.Rmd
Synoptic data provides a synopsis - a comprehensive and nearly instantaneous picture of the state of the atmosphere. This vignette demonstrates an example workflow for exploring air quality synoptic data using the AirSensor R package and data captured by PurpleAir air quality sensors.
PurpleAir sensor readings are uploaded to the cloud every 80 seconds or so where they are stored for download and display on the PurpleAir website. In order to capture and download the latest synoptic dataset, begin by importing the required Mazama Science packages, setting the location of pre-generated archive files and loading the most recent PurpleAir Synoptic (“pas
”) data.
library(AirSensor)
library(dplyr)
library(ggplot2)
setArchiveBaseUrl("http://data.mazamascience.com/PurpleAir/v1")
pas <- pas_load()
The pas_load()
function is a wrapper function that will:
Download a raw dataset of the entire PurpleAir network that includes both metadata and recent PM2.5 averages for each deployed sensor across the globe. See downloadParseSynopticData()
for more info.
Enhance the raw dataset by replacing variables with more consistent and human readable names, adds spatial metadata for each sensor, and lists the nearest official air quality monitor. For a more in depth explanation, see enhanceSynopticData()
.
The pas
dataset contains 35 columns, and each row corresponds to different PurpleAir sensors. For the scope of this example we will focus on the columns labeled stateCode
, pm25_*
, humidity
, pressure
, temperature
, and pwfsl_closestDistance
.
The pm25_*
column is short-hand for all the columns containing PM2.5 data. The extended names are fairly obvious; the pm25_current
column contains data for the most recent air data, pm25_10min
contains a 10-minute average of PM2.5 data, and so on.
## # A tibble: 5 x 7
## pm25_current pm25_10min pm25_30min pm25_1hr pm25_6hr pm25_1day pm25_1week
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1.22 1.19 2.52 3.51 4.38 4.84 6.18
## 2 0.07 0.2 0.82 1.27 1.79 2.16 2.9
## 3 14.5 16.1 16.2 16.0 16.1 18.4 23.8
## 4 15.3 16.7 17.2 16.9 16.4 18.5 23.5
## 5 14.3 15.1 15.0 15.0 13.8 16.9 23.0
pas
PM2.5 dataTo visually explore a region, we can use our pas
data with the pas_leaflet()
function to plot an interactive leaflet map. By default, pas_leaflet()
will map the coordinates of each PurpleAir sensor and the hourly PM2.5 data. Clicking on a sensor will show sensor metadata.
If we want to narrow our selection, for example to California, and look at which locations have a moderate to unhealthy 6-hour average air quality rating we can create a script like:
=======If we want to narrow our selection, for example to California, and look at which locations have a moderate to unhealthy 6-hour average air quality rating we can create a script like:
>>>>>>> masterpas %>%
filter(stateCode == 'CA') %>%
filter(pm25_6hr >= 25.0) %>%
pas_leaflet(parameter = "pm25_6hr")
This code pipes our pas
data into dplyr::filter()
where we can set our selection criteria. A stateCode
is the ISO 3166-2 state name abbreviation, hence we limit our filter to only accept stateCode == 'CA'
. The pm25_6hr
selection, as mentioned above, selects the 6-hour average and restricts the sensors to those whose PM2.5 6-hour average is above 25.0. Be sure to include the "pm25_6hr"
parameter in pas_leaflet()
.
This code pipes our pas
data into dplyr::filter()
where we can set our selection criteria. A stateCode
is the ISO 3166-2 state name abbreviation, hence we limit our filter to only accept stateCode == 'CA'
. The pm25_6hr
selection, as mentioned above, selects the 6-hour average and restricts the sensors to those whose PM2.5 6-hour average is above 25.0. Be sure to include the "pm25_6hr"
parameter in pas_leaflet()
.
pas
auxillary dataWe can also explore and utilize other PurpleAir sensor data. Check the pas_leaflet()
documentation for all supported parameters.
Here is an example of humidity data captured from PurpleAir sensors across the state of California.
<<<<<<< HEAD ======= >>>>>>> masterBecause the pas
object is a dataframe, we can use functionality from the dplyr package to manipulate the data.
Below, we show how to find out which states have the highest percentage of PurpleAir sensors that are reporting a weekly average of >20 ug/m3.
sensorCount <-
pas %>%
count(stateCode, name="sensorCount", sort = TRUE)
gt20Count <-
pas %>%
select(stateCode, pm25_1week) %>%
filter(pm25_1week > 20.0) %>%
count(stateCode, sort = TRUE, name = "gt20Count")
df <-
left_join(gt20Count, sensorCount, by = "stateCode") %>%
mutate(gt20Fraction = gt20Count / sensorCount) %>%
arrange(desc(gt20Fraction)) %>%
top_n(15)
## Selecting by gt20Fraction
From our df
data frame, we can use ggplot2
to visualize our results.
df %>%
ggplot(aes(x = reorder(stateCode, gt20Fraction), y = gt20Fraction)) +
coord_flip() +
geom_col(aes(fill = gt20Fraction)) +
scale_y_continuous(labels = scales::percent) +
labs(x = "State", y = "Percent of PA sensors above 20 \u00B5g/m3") +
theme(legend.position = "none")
As you can see, the AirSensor package combined with data provided by PurpleAir offers a powerful system for regional air-quality assessment.
Mazama Science