diff --git a/docs/.nojekyll b/docs/.nojekyll new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/docs/.nojekyll @@ -0,0 +1 @@ + diff --git a/docs/404.html b/docs/404.html new file mode 100644 index 0000000..b749587 --- /dev/null +++ b/docs/404.html @@ -0,0 +1,80 @@ + + +
+ + + + +Creative Commons Legal Code
+CC0 1.0 Universal
+CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
+LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
+ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
+INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
+REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
+PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
+THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
+HEREUNDER.Statement of Purpose
+The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an “owner”) of an original work of authorship and/or a database (each, a “Work”).
+Certain owners wish to permanently relinquish those rights to a Work for the purpose of contributing to a commons of creative, cultural and scientific works (“Commons”) that the public can reliably and without fear of later claims of infringement build upon, modify, incorporate in other works, reuse and redistribute as freely as possible in any form whatsoever and for any purposes, including without limitation commercial purposes. These owners may contribute to the Commons to promote the ideal of a free culture and the further production of creative, cultural and scientific works, or to gain reputation or greater distribution for their Work in part through the use and efforts of others.
+For these and/or other purposes and motivations, and without any expectation of additional consideration or compensation, the person associating CC0 with a Work (the “Affirmer”), to the extent that he or she is an owner of Copyright and Related Rights in the Work, voluntarily elects to apply CC0 to the Work and publicly distribute the Work under its terms, with knowledge of his or her Copyright and Related Rights in the Work and the meaning and intended legal effect of CC0 on those rights.
+Waiver. To the greatest extent permitted by, but not in contravention of, applicable law, Affirmer hereby overtly, fully, permanently, irrevocably and unconditionally waives, abandons, and surrenders all of Affirmer’s Copyright and Related Rights and associated claims and causes of action, whether now known or unknown (including existing as well as future claims and causes of action), in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the “Waiver”). Affirmer makes the Waiver for the benefit of each member of the public at large and to the detriment of Affirmer’s heirs and successors, fully intending that such Waiver shall not be subject to revocation, rescission, cancellation, termination, or any other legal or equitable action to disrupt the quiet enjoyment of the Work by the public as contemplated by Affirmer’s express Statement of Purpose.
Public License Fallback. Should any part of the Waiver for any reason be judged legally invalid or ineffective under applicable law, then the Waiver shall be preserved to the maximum extent permitted taking into account Affirmer’s express Statement of Purpose. In addition, to the extent the Waiver is so judged Affirmer hereby grants to each affected person a royalty-free, non transferable, non sublicensable, non exclusive, irrevocable and unconditional license to exercise Affirmer’s Copyright and Related Rights in the Work (i) in all territories worldwide, (ii) for the maximum duration provided by applicable law or treaty (including future time extensions), (iii) in any current or future medium and for any number of copies, and (iv) for any purpose whatsoever, including without limitation commercial, advertising or promotional purposes (the “License”). The License shall be deemed effective as of the date CC0 was applied by Affirmer to the Work. Should any part of the License for any reason be judged legally invalid or ineffective under applicable law, such partial invalidity or ineffectiveness shall not invalidate the remainder of the License, and in such case Affirmer hereby affirms that he or she will not (i) exercise any of his or her remaining Copyright and Related Rights in the Work or (ii) assert any associated claims and causes of action with respect to the Work, in either case contrary to Affirmer’s express Statement of Purpose.
Limitations and Disclaimers.
CybertownTraining.RmdExpert +Query empowers users to access surface water quality data from the +Assessment +and TMDL Tracking and Implementation System (ATTAINS) encompassing +Assessment decisions (under Clean Water Act Sections 303(d), 305(b), and +106) and Action data like Total Maximum Daily Loads (TMDLs), Advance +Restoration Plans (ARPs), and Protection Approaches.
+The tool supports querying and downloading data within and across +organizations (states or tribes), such as querying all the data within +an EPA region or all nutrient TMDLs nationally. It also allows users to +download national ATTAINS data extracts.
+rExpertQuery takes this a step further as its functions facilitate +querying the Expert Query web services and importing the resulting data +sets directly in to R for more detailed review and analysis.
+It is crucial to be aware of certain complexities inherent in the +data and querying process. Users are cautioned against direct +comparisons between states due to differences in water quality standards +and assessment methodologies. The summary and count examples +provided here are intended to demonstrate potential uses of rExpertQuery +and complementary R packages, but do not imply EPA’s endorsement of any +specific summary or count method.
+While Expert Query facilitates data querying, it lacks built-in +summary capabilities. Users are encouraged to download and summarize +data, but use caution when relying on row counts, as duplicate entries +may occur. This complexity arises from multiple tables that have +many-to-many relationships. Several examples of summarizing Expert Query +data are included in this vignette. These examples use additional +packages for data wrangling, mapping, visualizing and other tasks. +Future rExpertQuery vignettes and functions may provide additional +examples of how to summarize Expert Query data.
+You must first have R and R Studio installed to use the rExpert Query +(see instructions below if needed). rExpert Query is in active +development, therefore we highly recommend that you update it and all of +its dependency libraries each time you use the package.
+You can install and/or update the rExpertQuery Package +and all dependencies by running:
+
+# install and load rExpertQuery
+if (!"remotes" %in% installed.packages()) {
+ install.packages("remotes")
+}
+
+remotes::install_github("USEPA/rExpertQuery", ref = "develop", dependencies = TRUE, force = TRUE)## Downloading GitHub repo USEPA/rExpertQuery@develop
+##
+## ── R CMD build ─────────────────────────────────────────────────────────────────
+## checking for file 'C:\Users\hmarler\AppData\Local\Temp\RtmpywqNsa\remotes2b947e5159e5\USEPA-rExpertQuery-ce1ba15/DESCRIPTION' ... checking for file 'C:\Users\hmarler\AppData\Local\Temp\RtmpywqNsa\remotes2b947e5159e5\USEPA-rExpertQuery-ce1ba15/DESCRIPTION' ... ✔ checking for file 'C:\Users\hmarler\AppData\Local\Temp\RtmpywqNsa\remotes2b947e5159e5\USEPA-rExpertQuery-ce1ba15/DESCRIPTION' (601ms)
+## ─ preparing 'rExpertQuery': (4.7s)
+## checking DESCRIPTION meta-information ... checking DESCRIPTION meta-information ... ✔ checking DESCRIPTION meta-information
+## ─ checking for LF line-endings in source and make files and shell scripts (430ms)
+## ─ checking for empty or unneeded directories
+## Omitted 'LazyData' and 'LazyDataCompression' from DESCRIPTION
+## ─ building 'rExpertQuery_0.1.0.tar.gz'
+##
+##
+
+This vignette also requires additional packages which can be +installed using the code chunk below:
+
+# # list of additional required packages
+# demo.pkgs <- c("datasets", "data.table", "dplyr", "DT", "ggplot2", "httr", "leaflet", "maps", "plotly", "sf", "usdata", "usmap", "stringi")
+#
+# # install additional packages not yet installed
+# installed_packages <- demo.pkgs %in% rownames(installed.packages())
+#
+# if (any(installed_packages == FALSE)) {
+# install.packages(demo.pkgs[!installed_packages])
+# }
+
+# load additional packages for this vignette
+library(datasets)
+library(data.table)## Warning: package 'data.table' was built under R version 4.4.3
+
+## Warning: package 'dplyr' was built under R version 4.4.3
+
+## Warning: package 'DT' was built under R version 4.4.3
+
+## Warning: package 'httr' was built under R version 4.4.3
+
+## Warning: package 'leaflet' was built under R version 4.4.3
+
+## Warning: package 'maps' was built under R version 4.4.3
+
+## Warning: package 'plotly' was built under R version 4.4.3
+
+## Warning: package 'usdata' was built under R version 4.4.3
+
+## Warning: package 'usmap' was built under R version 4.4.3
+
+## Warning: package 'stringi' was built under R version 4.4.3
+As a final setup step, we will turn off scientific notation so that +any catchment id numbers from our queries are displayed correctly.
+
+# turn off scientific notation (for catchment ids)
+options(scipen = 999)Most of the rExpertQuery functions require an API key. Users can +obtain their own by using Expert Query’s API +Key Signup Form.
+
To run this vignette, we can use the following test key. Users should +obtain their own API key if they will be using rExpertQuery to develop +their own workflows.
+
+# load test api key
+testkey <- "53BVce47MQ3KXKibjx35g4ojaDQGh8qWfbdO8cE0"There are ten exported functions in rExpertQuery. The first eight +allow users to query ATTAINS data via Expert Query +web services. These are:
+These first eight functions are analogous to the data profiles that +can be queried through the Expert Query User +Interface.
+
An additional function allows users to download the Expert Query National Extracts of ATTAINS data which are +updated weekly. This function should be used when national data are +desired or when a query designed in one of the previous eight functions +would exceed the Expert Query web services limit of one millions rows. +This function is:
+The tenth function relies on ATTAINS +web services to provide allowable domain values for some +rExpertQuery parameters:
+In this session, we will use rExpertQuery functions, along with some +common R packages for data wrangling and visualization to answer some +commonly asked questions with Expert Query data. Members of the ATTAINS +team will be monitoring the chat for questions. If there are many +questions regarding a particular function, we may pause at the end of +that function’s material to review. Otherwise, questions are welcome at +the end of the demo as time permits or via e-mailing the ATTAINS team at +attains@epa.gov.
+EQ_NationalExtract() +provides an efficient method for importing the Expert +Query National Extracts. This function requires one argument, +“extract” to specify which of the National Extracts to download. The +following examples show how to download the Assessments and Assessments +with Monitoring Locations extracts. The extracts are all large files and +some may not open in R if there is not enough memory available.
+
What is the most commonly reported impairment +nationally?
+After downloading the national assessments data, we’ll need to filter +to include only the latest cycle. Next, we can filter for rows +containing parameter causes, and summarize how many times each parameter +was listed as a cause of impairment. The additional packages used to +answer this question are dplyr (data manipulation) and DT (interface to DataTables +library).
+
+# import natioanl assessments profile
+assessments.nat <- rExpertQuery::EQ_NationalExtract("assessments")## [1] "EQ_NationalExtract: downloading Assessments Profile (Expert Query National Extract). It was last updated on November 07, 2025 at 08:50 PM EST."
+
+# filter for latest assessments
+assessments.nat <- assessments.nat |>
+ dplyr::group_by(organizationId) |>
+ dplyr::slice_max(reportingCycle) |>
+ dplyr::select(-objectId) |>
+ dplyr::distinct() |>
+ dplyr::ungroup()
+
+# count impairments
+count.impair <- assessments.nat |>
+ # filter to retain only parameter causes
+ dplyr::filter(parameterStatus == "Cause") |>
+ # subset data to find unique combinations
+ dplyr::select(organizationId, assessmentUnitId, parameterName) |>
+ # remove duplicates
+ dplyr::distinct() |>
+ # group by parameterName
+ dplyr::group_by(parameterName) |>
+ # count listings for each parameterName
+ dplyr::summarise(count = length(assessmentUnitId)) |>
+ # arrane in descending order
+ dplyr::arrange(desc(count))
+
+# create data table of impairment count results
+DT::datatable(count.impair, options = list(pageLength = 10, scrollX = TRUE))Which waters are impaired for SELENIUM nationally? Which +states contain waters impaired for SELENIUM?
+Again, we’ll use the national assessments data filtered to the latest +cycle. Then we will filter for rows where the parameter is both selenium +and listed as a cause. The additional packages used in this step are dplyr and DT.
+
+# filter assessments data set for
+param.nat <- assessments.nat |>
+ # filter for selenium and cause
+ dplyr::filter(parameterName == "SELENIUM" &
+parameterStatus == "Cause")
+
+# data table of selenium impairments
+DT::datatable(param.nat, options = list(pageLength = 5, scrollX = TRUE))Then we can create a list of states that have at least one selenium +impairment listed. We can also visualize this information on a map, with +states containing selenium impairments filled in blue. To build this +map, we first create a data frame with all 50 states and a +“SeleniumImpairment” column populated with “yes” or “no” to indicate +whether it has any listed selenium impairments.
+In addition to dplyr, this +code chunk also uses usdata (US +demographic data), datasets +(collection of data sets for use in R), usmap (US choropleth plotting) and ggplot2 (creating +graphics).
+
+# filter to create df of states that have at least one selenium impairment}
+state.names.sel <- param.nat |>
+ # select state column from filtered df
+ dplyr::select(state) |>
+ # retain only distinct values for state
+ dplyr::distinct() |>
+ # change state abbreviation to full state name
+ dplyr::mutate(state = usdata::abbr2state(state)) |>
+ # create a list from the df
+ dplyr::pull()
+
+# read in a list of all 50 state names
+state.names <- datasets::state.name
+
+# create df with of all 50 state names
+sel.map.data <- as.data.frame(state.names) |>
+ # rename state.name column to state
+ dplyr::rename(state = state.names) |>
+ # create additional column to indicate if state has recorded selenium impairments ("yes") or not ("no")
+ dplyr::mutate(SeleniumImpairment = factor(ifelse(state %in% state.names.sel,
+"yes", "no"
+ )))
+# create map, specifying data and values
+usmap::plot_usmap(
+ regions = "state", data = sel.map.data,
+ values = "SeleniumImpairment", color = "black"
+) +
+ # fill states with selenium impairments with blue
+ ggplot2::scale_fill_manual(
+values = c(`no` = "white", `yes` = "blue")
+ ) +
+ # remove legend
+ ggplot2::theme(legend.position = "none")
+Fig 1. States with selenium listed as a cause of impairment for any +waters are shown in blue. +
+Which waters have a TMDL and are fully supporting +nationally?
+Starting with the filtered data frame of latest cycle assessments, we +can filter for Fully Supporting waters with an asssociatedActionType of +“TMDL”. Again, we can use a combination of dplyr and DT, for data manipulation and +creating a data table, respectively.
+
+# filter national assessments to includ only Fully Supporting w/ TMDL
+tmdl.fs <- assessments.nat |>
+ dplyr::filter(
+overallStatus == "Fully Supporting",
+associatedActionType == "TMDL"
+ ) |>
+ # select columns to review
+ dplyr::select(
+organizationId, state, assessmentUnitId, assessmentUnitName,
+associatedActionId
+ ) |>
+ # group by state and assessmentUnit
+ dplyr::group_by(state, assessmentUnitId, assessmentUnitName) |>
+ # create column continaing all actionIds associated with the assessmentUnit
+ dplyr::mutate(associatedActionIds = paste(unique(associatedActionId),
+collapse = ", "
+ )) |>
+ # remove original associatedActionId column
+ dplyr::select(-associatedActionId) |>
+ # retain only distinct rows
+ dplyr::distinct()
+
+# fully supporting waters with tmdls data table
+DT::datatable(tmdl.fs, options = list(pageLength = 5, scrollX = TRUE))What are the impaired waters (cat 4 and 5 +nationally)?
+The workflow here is similar to the previous question, relying on dplyr and DT. This time we are retaining +a few more columns for inclusion in the data table and are filtering by +epaIRCategory for category 4 and 5 waters.
+
+# select subset of columns to include
+imp.waters <- assessments.nat |>
+ dplyr::select(
+organizationId, organizationName, organizationType,
+assessmentUnitId, assessmentUnitName, assessmentDate,
+epaIrCategory, waterType, waterSize, waterSizeUnits
+ ) |>
+ # retain only distinct rows
+ dplyr::distinct() |>
+ # filter for category 4 and 5 waters
+ dplyr::filter(epaIrCategory %in% c("4", "5"))The resulting data frame is large, with 128,919 unique impaired +waters nationally. Due to its size, we’ll include a random subset of 250 +results in a data table to review during the demo. You can view the full +results by viewing the imp.waters df.
+
+# data table with random 250 impaired waters
+DT::datatable(
+ imp.waters |>
+dplyr::slice_sample(n = 250),
+ options = list(pageLength = 5, scrollX = TRUE)
+)Which states have included any information about +monitoringLocations in ATTAINS? Which states have included any +monitoringLocationDataLinks?
+To answer this question, we’ll need to import a new national extract. +As we are curious about monitoringLocations and +monitoringLocationDataLinks, we will want to start with the Assessment +Units with Monitoring Locations data set. The param value for this in +EQ_NationalExtract is “au_mls”. Additional packages needed are dplyr and DT. We’ll use some additional +options in DT to highlight +useful rows in the final data table.
+
+# import national extract for assessment units/monitoring locations
+nat.ausmls <- rExpertQuery::EQ_NationalExtract("au_mls")The next step is to filter the data frame and retain only rows that +contain a value for monitoringLocationId and are relevant to states. The +resulting data frame, called “filtered.mls” will allow us to determine +(1) which organizations have included monitoringLocationIdentifiers and +(2) which organizations have included monitoringLocationDataLinks.
+To find the organizations with monitoringLocationIdentifiers, we will +select only the rows organizationId, organizationName, and +organizationType and retain the unique rows.
+To find the organizations with monitoringLocationIdentifiers, we will +apply an additional filter and keep only rows that have a value in the +monitoringLocationDataLink column. Then select only the rows +organizationId, organizationName, and organizationType and retain the +unique rows.
+
+# filter for rows that contain monitoringLocationIds and are from State orgs
+filtered.mls <- nat.ausmls |>
+ dplyr::filter(
+monitoringLocationId != "",
+!is.na(monitoringLocationId),
+organizationType == "State"
+ )
+
+# start with data set filtered to include only rows with MonitoringLocationIds
+orgs.mls <- filtered.mls |>
+ # select columns to describe orgs
+ dplyr::select(organizationId, organizationName, organizationType) |>
+ dplyr::distinct()
+
+
+# start with data set filtered to include only rows with MonitoringLocationIds
+links.mls <- filtered.mls |>
+ # filter for orgs with monitoringLocationDataLinks
+ dplyr::filter(
+monitoringLocationDataLink != "",
+!is.na(monitoringLocationDataLink)
+ ) |>
+ # select columns to describe orgs
+ dplyr::select(organizationId, organizationName, organizationType) |>
+ # retain distinct rows
+ dplyr::distinct()There are 21 with monitoringLocationId data in ATTAINS. Of these +states, 3 also contain monitoringLocationDataLinks. We can create a data +table to display this information and apply some conditional formatting +to highlight the states with monitoringLocationIdentifiers in blue and +those with monitoringLocationIdentifiers and monitoringLocationDataLinks +in orange.
+
+# create data frame summarizing which states have monitoringLocationIds and monitoringLocationDataLinks in ATTAINS
+ml.table <- nat.ausmls |>
+ # create df of all orgs
+ dplyr::select(organizationId, organizationName, organizationType) |>
+ # retain only distinct rows
+ dplyr::distinct() |>
+ # add columns indicating whether the org has any data in ATTAINS for monitoringLocationId and monitoringLocationDataLink
+ dplyr::mutate(
+monitoringLocationId = ifelse(organizationId %in%
+ orgs.mls$organizationId,
+"yes", "no"
+),
+monitoringLocationDataLink = ifelse(organizationId %in%
+ links.mls$organizationId,
+"yes", "no"
+)
+ )
+# create data table
+DT::datatable(ml.table, options = list(pageLength = 10, scrollX = TRUE)) |>
+ # highlight rows for orgs that have monitoringLocationId in blue
+ DT::formatStyle(
+"monitoringLocationId",
+target = "row",
+backgroundColor = styleEqual(c("yes"), c("#66c3c5"))
+ ) |>
+ # highlight rows for orgs that also have data for data links in organge
+ DT::formatStyle(
+"monitoringLocationDataLink",
+target = "row",
+backgroundColor = styleEqual(c("yes"), c("#f9b97d"))
+ )EQ_Assessments()queries +ATTAINS Assessments data. For many use cases, querying Assessments for a +single organization or state is desirable. The next example shows how to +query by a single state for the latest cycle. The default for the +function is to query for the latest cycle. If you would like to search +for an older cycle, you can set the “report_cycle” parameter equal to +the desired cycle year in YYYY format.
+How many stream miles in Kansas are impaired for primary +contact recreation in the latest cycle?
+First, we can construct our query to return only Primary Contact +Recreation Assessments from the latest cycle from Kansas for streams. +The remaining steps can be accomplished with dplyr functions to retain unique +combinations of assessmentUnitId, assessmentUnitName, waterSize and +waterSizeUnits, then summing the waterSize column to find the number of +stream miles.
+
+# set up query to search assessments for KS, primary contact recreation use, and stream, default behavior is to query latest report cycle
+ks.imp.streams <- rExpertQuery::EQ_Assessments(
+ statecode = "KS",
+ use_name = "Primary Contact Recreation",
+ api_key = testkey,
+ water_type = "STREAM"
+)
+
+# calculate total water size
+ks.miles <- ks.imp.streams |>
+ # select required columns
+ dplyr::select(
+assessmentUnitId, assessmentUnitName,
+waterSize, waterSizeUnits
+ ) |>
+ # retain distinct rows
+ dplyr::distinct() |>
+ # sum water size
+ dplyr::summarize(totalWaterSize = sum(waterSize)) |>
+ dplyr::pull()There are 11,129.78 stream miles in Kansas impaired for primary +contract recreation use.
+What waters in Montana are impaired due to Zinc? How many +different uses are zinc impairments associated with in +Montana?
+We start by querying for Montana assessments with zinc as a cause and +can use dplyr and plotly (graphing library) to summarize +the results by useName and create a bar plot of the results.
+
+# query MT's latest assessments for zinc as a cause
+MT.impair <- rExpertQuery::EQ_Assessments(
+ statecode = "MT",
+ api_key = testkey,
+ param_name = "ZINC",
+ param_status = "Cause"
+)## [1] "EQ_Assessments: The current query will return 123 rows."
+Next we summarize the uses associated with zinc impairments and count +the number of unique assessmentUnitIdentifiers for each use to +facilitate creating a bar plot.
+
+# summarize by useName
+MT.impair.use <- MT.impair |>
+ dplyr::group_by(useName) |>
+ # count AUIDs by use
+ dplyr::summarize(count = length(unique(assessmentUnitId)))There are 3 uses with zinc as a cause in Montana: Agricultural, +Aquatic Life, and Drinking Water . As our last step, we’ll create the +bar plot.
+
+plotly::plot_ly(
+ data = MT.impair.use,
+ x = ~useName,
+ y = ~count,
+ type = "bar"
+)+Fig 2. Zinc impairments by useName for Montana’s latest assessment +cycle. +
+Which waters are new to category 5 in the 2024 cycle for +Illinois? Which categories did they move from?
+This is another question that is well suited to using dplyr and plotly in addition to EQ_Assessments. +The query is for category 5 waters from Illinois from the latest +reporting cycle. In order to compare reporting cycles, we’ll also need +to provide a value for the previous cycle and run the query again, this +time for the previous reporting cycle and without specifying the +category.
+
+# get cat 5 waters from 2024 assessment cycle
+IL.latest <- rExpertQuery::EQ_Assessments(
+ statecode = "IL",
+ epa_ir_cat = "5",
+ api_key = testkey
+)
+
+# select required columns from latest cycle data
+IL.latest <- IL.latest |>
+ dplyr::select(assessmentUnitId, epaIrCategory) |>
+ # retain unique rows
+ dplyr::distinct()
+
+
+# get all assessments from previous cycle
+IL.previous <- rExpertQuery::EQ_Assessments(
+ statecode = "IL",
+ report_cycle = 2022,
+ api_key = testkey
+)Next we’ll need to filter the previous cycle results to retain only +the assessment units that are category 5 waters in the latest cycle. +This allows us to combine the latest and previous cycle data frames and +filter them to create a data frame which contains only the assessment +units that changed from some other category to category 5 from the +previous cycle to the latest cycle.
+
+# filter previous cycle to include only cat 5 waters from 2024
+IL.previous <- IL.previous |>
+ # retain only cat 5 waters from latest cycle
+ dplyr::filter(assessmentUnitId %in% IL.latest$assessmentUnitId) |>
+ # select auid and category
+ dplyr::select(assessmentUnitId, epaIrCategory) |>
+ # rename to indicate previous cycle
+ dplyr::rename(epaIrCategory.previous = epaIrCategory)
+
+# combine latest and previous cycle dfs
+IL.combine <- IL.latest |>
+ dplyr::left_join(IL.previous, dplyr::join_by(assessmentUnitId))
+
+# create df of the auids that changed to cat 5 in 2024
+IL.changes <- IL.combine |>
+ dplyr::filter(epaIrCategory != epaIrCategory.previous) |>
+ dplyr::distinct()Finally, we can summarize by counting the categories the new category +5s were previously in and create a pie chart with a default color scheme +to visualize.
+
+# summarize what cats the new cat 5s were in previously
+IL.change.sum <- IL.changes |>
+ dplyr::group_by(epaIrCategory.previous) |>
+ dplyr::summarise(count = dplyr::n_distinct(assessmentUnitId))
+# create plotly plot
+plotly::plot_ly() |>
+ # add pipe based IL.change.sum
+ plotly::add_pie(
+data = IL.change.sum, ~count, labels = ~epaIrCategory.previous,
+marker = list(
+ line = list(color = "white", width = 2)
+), type = "pie",
+hoverinfo = "value", outsidetextfont = list(color = "black"),
+sort = FALSE
+ ) |>
+ # add legend
+ plotly::layout(legend = list(
+orientation = "h",
+xanchor = "center",
+x = 0.5
+ ))+Fig 3. New category 5 waters in Illinois in the 2024 cycle, summarized +by their 2022 categories. +
+EQ_AssessmentUnits() +queries ATTAINS Assessment Units data. The default is to only return +“Active” assessment units, but input params can be adjusted to query for +“Retired” or “Historical” assessment units in addition to or instead of +“Active”.
+What are the retired assessment units for +Florida?
+This can be answered with a simple EQ_AssessmentUnits query +specifying Florida with an assesment unit status of “Retired” without +using functions from any additional packages. For easier review, we’ll +create a datatable with DT.
+
+# search for retired assessment units in Florida
+FL.ret.auids <- rExpertQuery::EQ_AssessmentUnits(
+ statecode = "FL",
+ au_status = "Retired",
+ api_key = testkey
+)## [1] "EQ_AssessmentUnits: The current query will return 35 rows."
+
+
+
+EQ_TMDLs() +queries ATTAINS TMDL data and can utilize many search parameters.
+What are all the TMDLs in HI? How many are there (counted +as unique combinations of actionID/pollutant/assessmentUnitId)? How many +are associated with each pollutantGroup?
+To answer this question we’ll use dplyr and plotly. The EQ_TMDLs query is simple, +specifying only the state of Hawaii since we want to return all of +Hawaii’s TMDLs.
+
+# query for Hawaii TMDLs
+HI.tmdls <- rExpertQuery::EQ_TMDLs(
+ statecode = "HI",
+ api_key = testkey
+)The 536 rows can be reviewed in the data frame. Keep in mind that if +we want to count one TMDL as one unique combination of +assessmentUnitIdentifier, pollutant and actionId, we can’t simply count +the number of rows as as each +assessmentUnitIdentifier/pollutant/actionId may be associated with +multiple parameters, meaning it will be listed more than once. The code +chunk below demonstrates one way to determine the number of TMDLs by +unique assessmentUnitIdentifier/pollutant/actionId. We can than +visualize TMDLs related to each pollutantGroup with a simple bar +graph.
+
+# determine the number of tmdls
+HI.count <- HI.tmdls |>
+ # select required columns
+ dplyr::select(assessmentUnitId, assessmentUnitName, actionId, actionName, pollutant, pollutantGroup, fiscalYearEstablished, actionAgency, planSummaryLink) |>
+ # retain unique rows
+ dplyr::distinct()
+
+# determine number per group
+HI.sum <- HI.count |>
+ # groupy by pollutantGroup
+ dplyr::group_by(pollutantGroup, ) |>
+ # count unique auids per pollutantGroup
+ dplyr::summarize(count = dplyr::n_distinct(assessmentUnitId)) |>
+ # if pollutantGroup is missing, assign to group named "NULL"
+ dplyr::mutate(pollutantGroup = ifelse(is.na(pollutantGroup), "NULL", pollutantGroup))
+# create bar plot
+plotly::plot_ly(
+ data = HI.sum,
+ x = ~pollutantGroup,
+ y = ~count,
+ type = "bar"
+)+Fig 4. Number of TMDLs per pollutantGroup in Hawaii. +
+There are 168 TMDLs in Hawaii.
+How many TMDL projects (by actionId) were approved so far +in FY 2025 in R10?
+This question can be answered simply with a well crafted query and a +single dplyr function. The +2025 fiscal year began on October 1, 2024, so that will be set as the +tmdl_start_date parameter along with specifying 10 for the region.
+
+# r10 query for FY 2025 TMDLs
+tmdl.proj <- rExpertQuery::EQ_TMDLs(
+ region = 10,
+ tmdl_date_start = "2024-10-01",
+ api_key = testkey
+)
+
+# count distinct actionIds
+tmdl.proj.count <- dplyr::n_distinct(tmdl.proj$actionId)There are TMDL projects (by actionID) approved so far in FY 2025. In +order to create a user-friendly DT datatable, we will have to +take a few more steps. The columns, assessmentUnitId, pollutant, and +addressedParameter are likely to have more than one value per actionId. +In order to create a readable table with only one row per actionId, we +will use dplyr functions to +manipulate the data frame so that for example, multiple pollutants will +be listed together in the same row.
+
+tmdl.sum <- tmdl.proj |>
+ # select columns
+ dplyr::select(
+region, state, organizationId, organizationName, organizationId,
+actionId, actionName, assessmentUnitId, pollutant,
+addressedParameter, tmdlDate, planSummaryLink
+ ) |>
+ # retain unique rows
+ dplyr::distinct() |>
+ # group by actionId
+ dplyr::group_by(actionId) |>
+ # create new columns to address issues with mulitple rows
+ dplyr::mutate(
+assessmentUnitIds = paste(sort(unique(assessmentUnitId)), collapse = ", "),
+pollutants = paste(sort(unique(pollutant)), collapse = ", "),
+addressedParameters = paste(sort(unique(addressedParameter)), collapse = ", ")
+ ) |> dplyr::select(-assessmentUnitId, -pollutant, -addressedParameter) |>
+ # retain unique rows
+ dplyr::distinct()
+
+# create data table
+DT::datatable(
+ tmdl.sum |>
+rExpertQuery::EQ_FormatPlanLinks(),
+ escape = FALSE,
+ options = list(pageLength = 2, scrollX = TRUE)
+)EQ_Actions()queries +ATTAINS Actions data by a variety of user supplied parameters.
+Which actions in Delaware should be included in measures? +How many actions (by actionIds) are in this group?
+We can use dplyr and DT with the query results to +find the answer and display the relevant actionIds. For the query, we +use EQ_Actions to search for Deleware actions with the “in_meas” +parameter set to “Yes”. Then we count the distinct actionIds.
+
+# query Actions from Delaware that are included in Measures
+DE.meas <- rExpertQuery::EQ_Actions(
+ statecode = "DE",
+ in_meas = "Yes",
+ api_key = testkey
+)## [1] "EQ_Actions: The current query will return 649 rows."
+
+# count unique actionIds
+DE.count <- dplyr::n_distinct(DE.meas$actionId)There are 480 actionIds that should be included when calculating +measures for Delaware. In a similar process to the final TMDL example, +we can create a data table with one row per actionId.
+
+# determine unique actionIds
+DE.actids <- DE.meas |>
+ # select columns
+ dplyr::select(
+actionType, actionId, actionName, actionAgency, parameter,
+completionDate, assessmentUnitId, fiscalYearEstablished,
+planSummaryLink
+ ) |>
+ # retain unique rows
+ dplyr::distinct() |>
+ # group by actionId
+ dplyr::group_by(actionId) |>
+ # create new columns to address issues with mulitple rows
+ dplyr::mutate(
+parameters = paste(unique(parameter), collapse = ", "),
+assessmentUnitIds = paste(unique(assessmentUnitId), collapse = ", ")
+ ) |>
+ dplyr::select(-parameter, -assessmentUnitId) |>
+ # retain unique rows
+ dplyr::distinct()
+DT::datatable(
+ DE.actids |>
+rExpertQuery::EQ_FormatPlanLinks(),
+ escape = FALSE,
+ options = list(pageLength = 5, scrollX = TRUE)
+)How many actions (by actionId) were established in R3 in +FY 2024? What was the most common type of action established in R3 in FY +2024?
+For this query, we will set region equal to 3 and both +fisc_year_start and fisc_year_end to 2024. Then use dplyr and DT to summarize and display +results as in the first EQ_Actions example.
+
+# query for R3 FY2024 actions
+act.fy.24 <- rExpertQuery::EQ_Actions(
+ region = 3,
+ fisc_year_start = 2024,
+ fisc_year_end = 2024,
+ api_key = testkey
+)
+
+# count unique actionIds
+n_act.fy.24 <- dplyr::n_distinct(act.fy.24$actionId)There were 7 unique actions by actionId established in R3 in FY 2024. +We can create data table with counts for each actionType.
+
+act.fy.24.counts <- act.fy.24 |>
+ # select columns
+ dplyr::select(
+state, organizationId, organizationName,
+actionId, actionName, actionType
+ ) |>
+ # retain unique rows
+ dplyr::distinct() |>
+ # group by action type
+ dplyr::group_by(actionType) |>
+ # count actionId per each action type
+ dplyr::summarize(count = length(unique(actionId)))
+
+DT::datatable(act.fy.24.counts)EQ_ActionsDocuments() +queries ATTAINS Actions Documents data with user supplied parameters. +One of its unique features is the ability to search the documents +themselves by keyword or phrase. The results are ranked by percentage +match, with higher percentages indicating a better match with the +keyword.
+What are all the actions documents that contain the +keyword “nutrient”?
+The only query parameter we will provide is the keyword +“nutrient”.
+
+# Actions Documents containing "nutrient"
+nutrient.doc <- rExpertQuery::EQ_ActionsDocuments(
+ doc_query = "nutrient",
+ api_key = testkey
+)This query yields 17,989 results. As well as providing the actionId +and documentName for all results, EQ_Actions() also returns the +column “actionDocumentUrl” containing the URL to link to the document. +Because the query yields so many results, we’ll take a look at the +highest ranking 50 matches in a DT data table to understand the +structure of the results.
+
+# create data table
+DT::datatable(
+ nutrient.doc |>
+dplyr::slice_max(rankPercent, n = 50) |>
+rExpertQuery::EQ_FormatPlanLinks(url.col = "actionDocumentUrl"),
+ options = list(pageLength = 10, scrollX = TRUE),
+ escape = FALSE
+)How many actions documents that contain the keyword +“nutrient” are from each region?
+To answer this question, we can count the number of unique actionIds +per region from the initial “nutrients” keyword query. We can display +the results in a plotly bar +plot.
+
+nutrient.doc.regions <- nutrient.doc |>
+ # group by region
+ dplyr::group_by(region) |>
+ # count action ids by region
+ dplyr::summarize(count = length(unique(actionId)))
+# create bar plot
+plotly::plot_ly(
+ data = nutrient.doc.regions,
+ x = ~region,
+ y = ~count,
+ type = "bar"
+)+Fig 5. Number of documents containing the keyword ‘nutrient’ by EPA +Region. +
+Which monitoringLocations correspond to assessmentUnits +in Alaska? How many monitoringLocations are associated with +assessmentUnits? Which organizations are the monitoringLocations +from?
+We start by querying for all monitoringLocations in Alaska and then +use dplyr functions to retain +only rows that have a value for monitoringLocation. We can create a DT data table to review the +results. We can also use the base R functions sort and unique to make a +list of the organizations represented in the monitoringLocationOrgId +column and organize them alphabetically.
+
+# query for assessment units and monitoring locations from Alaska
+ak.mls <- rExpertQuery::EQ_AUsMLs(
+ statecode = "AK",
+ api_key = testkey
+)
+
+# filter for assessment units with monitoring locations from Alaska
+filt.ak.mls <- ak.mls |>
+ # filter for rows with a valule for monitoring location id
+ dplyr::filter(
+!is.na(monitoringLocationId),
+monitoringLocationId != ""
+ ) |>
+ # select rows for review
+ dplyr::select(
+assessmentUnitId, assessmentUnitName, monitoringLocationId,
+monitoringLocationOrgId
+ ) |>
+ # retain unique rows
+ dplyr::distinct()
+
+# list of orgs inlcuded in monitoringLocationOrgId
+ak.orgs <- sort(unique(filt.ak.mls$monitoringLocationOrgId))
+
+DT::datatable(filt.ak.mls, options = list(pageLength = 10, scrollX = TRUE))There are 208 monitoringLocationIds associated with assessmentUnits +in Alaska. The monitoringLocationIds come from 5 different +organizations: `ADOT&PF, AKDECWQ, Kenai Watershed Forum, National +Park Service, and USFWS_ALASKA .
+Are there active assessmentUnits without any listed +monitoringLocations in Alaska?
+This question can be answered by filtering the initial Alaska query +results for assessmentUnitIds without any corresponding +monitoringLocationIdentifiers.
+
+# filter Alaska data set for assessmentUnitIds without listed monitoringLocationIds
+no.ml.ak <- ak.mls |>
+ dplyr::filter(!assessmentUnitId %in% filt.ak.mls$assessmentUnitId)
+
+# count number of unique asessment units without listed monitoringLocationIds
+n_no.ml.ak <- dplyr::n_distinct(no.ml.ak$assessmentUnitId)There are 636 active assessmentUnits in Alaska without +monitoringLocations listed in ATTAINS.
+Which states in Region 6 have listed probable Sources in +their latest report cycle?
+Start by querying for EQ_Source for region 6. Then use dplyr and usdata to +manipulate the data and create a list of the region 6 states that list +probable sources for any assessments in their latest report cycle.
+
+# query for probable sources in the latest assessment cycle in R6
+r6.sources <- rExpertQuery::EQ_Sources(
+ region = 6,
+ api_key = testkey
+)
+
+# filter for states with listed probable sources
+r6.sources.filter <- r6.sources |>
+ # remove all rows without a listed source
+ dplyr::filter(!is.na(sourceName)) |>
+ # select state
+ dplyr::select(state) |>
+ # retain distinct state
+ dplyr::distinct() |>
+ # create list
+ dplyr::pull() |>
+ # convert abbreviation to state name
+ usdata::abbr2state() |>
+ # sort alphabetically
+ sort()Arkansas, Louisiana, New Mexico, Oklahoma and Texas all listed +sources in their latest report cycles.
+What are the most commonly listed sources in the latest +report cycle in each Region 6 state?
+Using the same initial query results from the previous question, we +can use dplyr functions to +count the number of times a probable source is listed by each state, +select the most one from each state and make a a DT data table of the +results.
+
+# count the number of times a probable source is listed for each state
+r6.sources.summarize <- r6.sources |>
+ dplyr::select(state, organizationId, assessmentUnitId, sourceName) |>
+ dplyr::distinct() |>
+ dplyr::group_by(state, organizationId, sourceName) |>
+ dplyr::summarize(count = length(assessmentUnitId)) |>
+ dplyr::ungroup() |>
+ dplyr::group_by(state) |>
+ dplyr::slice_max(count)
+
+# data table for max count
+DT::datatable(r6.sources.summarize)EQ_CatchCorr() +queries ATTAINS Catchment Correspondance data. These queries return +large files, particularly if a large geospatial area is defined, so best +practice is to make your query as specific as possible. Many machines +may not have enough memory to load an entire state’s worth of catchment +correspondence data in an R session. It is also possible to focus on a +specific assessment unit in catchment correspondence queries.
+Which catchments correspond to assessmentUnitId “IL_N-99” +?
+We can start with a very focused query to retrieve information for +the Illinios assessment unit “IL_N-99”. Then we’ll use dplyr functions to retain the +unique rows make a a DT data +table.
+
+# query for IL_N-99 catchment data
+il.n99.catch <- rExpertQuery::EQ_CatchCorr(
+ statecode = "IL",
+ auid = "IL_N-99",
+ api_key = testkey
+)
+
+il.n99.catch <- il.n99.catch |>
+ # remove objectId
+ dplyr::select(-objectId) |>
+ # retain unique rows
+ dplyr::distinct()
+
+# create data table
+DT::datatable(il.n99.catch, options = list(pageLength = 10, scrollX = TRUE))How large of a catchment area corresponds to +“IL_N-99”?
+We will need some additional information in order to answer this +question. We could get catchment areas from the nhdplusTools package or +from ATTAINS geospatial web services. For this example, we will use +ATTAINS geospatial web services as this will also allow us to create a +map of the catchments and assessment unit. This example utlizes a few +new R packages, httr (wrapper for +curl packge), leaflet +(interactive maps), and sf +(spatial vector data).
+The first step is to set up query parameters for ATTAINS geospatial +webservices. Then query the assessment lines and catchment associations +layers to retrieve the geospatial data for the assessment unit and its +associated catchments. The data from the catchment association layer +allows us to sum the total area for all the catchments associated with +IL_N-99.
+
+# set up query params for ATTAINS geospatial web services
+query.params <- list(
+ where = "assessmentunitidentifier IN ('IL_N-99')",
+ outFields = "*",
+ f = "geojson"
+)
+
+# query ATTAINS geospatial assessments lines layer
+lines.response <- httr::GET("https://gispub.epa.gov/arcgis/rest/services/OW/ATTAINS_Assessment/MapServer/1/query?",
+ query = query.params
+)
+
+# read content from ATTAINS geospatial assessments lines layer
+lines.geojson <- httr::content(lines.response, as = "text", encoding = "UTF-8")
+
+# get sf from ATTAINS geospatial assessments lines layer
+lines.sf <- sf::st_read(lines.geojson, quiet = TRUE)
+
+# query ATTAINS geospatial catchment association layer
+catch.response <- httr::GET("https://gispub.epa.gov/arcgis/rest/services/OW/ATTAINS_Assessment/MapServer/3/query?",
+ query = query.params
+)
+
+# read content from ATTAINS geospatial catchment association layer
+catch.geojson <- httr::content(catch.response, as = "text", encoding = "UTF-8")
+
+# # get sf from ATTAINS geospatial assessments lines layer
+catch.sf <- sf::st_read(catch.geojson, quiet = TRUE)
+
+# sum the catchment areas that correspond to IL_N-99
+sum.catch <- catch.sf |>
+ dplyr::summarise(total = sum(areasqkm)) |>
+ sf::st_drop_geometry() |>
+ dplyr::pull()The total catchment area associated with IL_N-99 is 35.6019.
+To create map of IL_N-99 and and its associated catchments, we can +create a leaflet map with the default Open Street Map tile then add the +catchment polygons with a popup contain the nhdplusid and area in sq km. +The final step is to add the assessment unit line.
+
+# create leaflet map
+leaflet::leaflet() |>
+ # add open street map layer
+ leaflet::addTiles() |>
+ # add catchments, include popups with nhdplus id and area in sq km
+ leaflet::addPolygons(
+data = catch.sf, color = "darkorange",
+popup = paste0(
+ catch.sf$nhdplusid, "<br>",
+ catch.sf$areasqkm, " sq km"
+)
+ ) |>
+ # added assessment id
+ leaflet::addPolylines(data = lines.sf, color = "blue")+Fig. 6 Map of assessment unit IL_N-99 (blue) and its associated +catchments (orange). +
+EQ_DomainValues is unique in the rExpertQuery packages as it relies +on ATTAINS web services, rather than Expert Query web services in order +to provide allowable domain values. It was created to provide guidance +for users on allowable values for queries. The function returns all +information available from ATTAINS web services about the selected +domain and prints a message informing the user which column contains the +value that should be used in queries.
+Which parameters can EQ_DomainValues provide values +for?
+Running the EQ_DomainValues function with no value provided for the +“domain” argument will return a list of the parameter names that it can +return domain values for.
+
+# run EQ_DomainValues with no value provided for "domain" to return list of params
+domain.vals <- EQ_DomainValues()## [1] "EQ_DomainValues: getting list of available domain names."
+EQ_DomainValues can provide values for the following query +parameters: act_agency, act_status, act_type, ad_param, assess_basis, +assess_methods, assess_types, au_status, cause, delist_reason, doc_type, +file_type, loc_type, org_id, org_name, param_attain, param_group, +param_name, param_state_ir_cat, param_status, source_scale, source_type, +statecode, use_name, use_support and water_type .
+How many different values for “water_type” can be used in +rExpertQuery functions?
+To find the allowable values for “water_type” in rExpertQuery +function, we can run EQ_DomainValues again, this time specifying +“water_type”.
+
+water.types <- rExpertQuery::EQ_DomainValues("water_type")## [1] "EQ_DomainValues: For water_type the values in the name column of the function output are the allowable values for rExpert Query functions."
+The printed message informs us that the “name” column contains the +values for use in water_type queries for rExpertQuery functions. For +water_type there are 66 unique values. They are: BAY; BAYOU; BEACH; +BLACKWATER SYSTEM; CHANNEL; CIRQUE LAKE; COASTAL; COASTAL & BAY +SHORELINE; CONNECTING CHANNEL; CREEK; CREEK, INTERMITTENT; DITCH OR +CANAL; DRAIN; ESTUARY; ESTUARY, FRESHWATER; FLOWAGE; GREAT LAKES BAYS +AND HARBORS; GREAT LAKES BEACH; GREAT LAKES CONNECTING CHANNEL; GREAT +LAKES OPEN WATER; GREAT LAKES SHORELINE; GULCH; GULF; HARBOR; +IMPOUNDMENT; INLAND LAKE SHORELINE; INLET; ISLAND COASTAL WATERS; +LAGOON; LAKE; LAKE, FRESHWATER; LAKE, NATURAL; LAKE, PLAYA; LAKE, +SALINE; LAKE, SPRINGS; LAKE, WILD RICE; LAKE/RESERVOIR/POND; MARSH; +OCEAN; OCEAN/NEAR COASTAL; POND; RESERVOIR; RESERVOIR EMBAYMENT; RIVER; +RIVER, TIDAL; RIVER, WILD RICE; RIVERINE BACKWATER; SINK HOLE; SOUND; +SPRING; SPRINGSHED; STREAM; STREAM, COASTAL; STREAM, EPHEMERAL; STREAM, +INTERMITTENT; STREAM, PERENNIAL; STREAM, TIDAL; STREAM/CREEK/RIVER; +WASH; WATERSHED; WETLAND; WETLANDS, DEPRESSIONAL; WETLANDS, FRESHWATER; +WETLANDS, RIVERINE; WETLANDS, SLOPE; and WETLANDS, TIDAL.
+GettingStartedWithrExpertQuery.RmdExpert +Query empowers users to access surface water quality data from the +Assessment +and TMDL Tracking and Implementation System (ATTAINS) encompassing +Assessment decisions (under Clean Water Act Sections 303(d), 305(b), and +106) and Action data like Total Maximum Daily Loads (TMDLs), Advance +Restoration Plans (ARPs), and Protection Approaches.
+The tool supports querying and downloading data within and across +organizations (states or tribes), such as querying all the data within +an EPA region or all nutrient TMDLs nationally. It also allows users to +download national ATTAINS data extracts.
+rExpertQuery takes this a step further as its functions facilitate +querying the Expert Query web services from R functions which allow the +user to import the resulting data sets directly in to R for more +detailed review and analysis.
+It is crucial to be aware of certain complexities inherent in the +data and querying process. Users are cautioned against direct +comparisons between states due to differences in water quality standards +and assessment methodologies.
+While Expert Query facilitates data querying, it lacks built-in +summary capabilities. Users are encouraged to download and summarize +data, but use caution when relying on row counts, as duplicate entries +may occur. This complexity arises from multiple tables that have +many-to-many relationships. Future vignettes and functions may provide +examples of how to summarize Expert Query data.
+You must first have R and R Studio installed to use the rExpert Query +(see instructions below if needed). rExpert Query is in active +development, therefore we highly recommend that you update it and all of +its dependency libraries each time you use the package.
+You can install and/or update the rExpertQuery Package +and all dependencies by running:
+
+if (!"remotes" %in% installed.packages()) {
+ install.packages("remotes")
+}
+
+remotes::install_github("USEPA/rExpertQuery", ref = "develop", dependencies = TRUE, force = TRUE)
+
+library(dplyr)
+library(data.table)EQ_Actions()queries +ATTAINS Actions data by a variety of user supplied parameters. For +example, you might like to run a relatively simple query return data for +all Region 4 Actions that should not be included in Measures.
+
+# query Actions from Region 4 that are not included in Measures
+R4_actions_not_meas <- rExpertQuery::EQ_Actions(api_key = testkey, region = 4, in_meas = "No")This query returns 203 results, so we’ll review only a small random +subset of them in a data table just to get an idea of what those results +look like:
+
+# create random subset of R4 query results
+R4_subset <- R4_actions_not_meas |>
+ dplyr::slice_sample(n = 20)
+
+# create data tab;e
+DT::datatable(R4_subset, options = list(pageLength = 2, scrollX = TRUE))You might also want to create a more specific query. For example, you +might want to find all Actions from Missouri that were issued by EPA +with completion dates between 2000-01-01 and 2020-12-31. As this more +detailed query returns a smaller data set, the data table shows all +results of the query
+
+# query Actions for Missouri Actions with action agency "EPA" and parameter group "PATHOGENS"
+MO_epa <- rExpertQuery::EQ_Actions(
+ api_key = testkey, statecode = "MO",
+ act_agency = "EPA",
+ param_group = "PATHOGENS",
+ comp_date_start = "2000-01-01",
+ comp_date_end = "2020-12-31"
+)## [1] "EQ_Actions: The current query will return 4 rows."
+
+
+
+EQ_ActionsDocuments()queries +ATTAINS Actions Documents data with user supplied parameters. One of its +unique features is that you can search the documents themselves by +keyword or phrase. For example, you might want to find all Actions +Documents which contained the keyword “nutrients”.
+
+# Actions Documents containing "nutrient"
+Nutrient_docs <- rExpertQuery::EQ_ActionsDocuments(
+ doc_query = "nutrient",
+ api_key = testkey
+)## [1] "EQ_ActionDocuments: The current query will return 17,989 rows."
+This query yields 17,989 results. As well as providing the Action and +Document Name for all results, EQ_Actions() also returns the +column “actionDocumentUrl” containg the URL to link to the document. +Becaue the query yields so many results, we’ll take a look at a smaller +subset of them in a data table to understand the structure of the +results.
+
+Nutrient_docs_subset <- Nutrient_docs |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(Nutrient_docs_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)You can also query by more typical parameters like organization name +(org_name), EPA region (region) , or action type (act_type). The example +below demonstrates using the function to search for 4B Restoration +Approach documents from Pennsylvania.
+
+PA_4B <- rExpertQuery::EQ_ActionsDocuments(
+ statecode = "PA", act_type = "4B Restoration Approach",
+ api = testkey
+)## [1] "EQ_ActionDocuments: The current query will return 14 rows."
+
+PA_4B_subset <- PA_4B |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(PA_4B_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)As there are only 17 4B Restoration Approach documents from +Pennsylvania, we can review them all in the table below. As in the other +Actions Document Example, the actionDocumentUrl column contains a link +to the document.
+EQ_Assessments()queries +ATTAINS Assessments data. For many use cases, querying Assessments for a +single organization or state is desirable. The next example shows how to +query by a single state and use class for the latest cycle.
+
+KY_assessments <- rExpertQuery::EQ_Assessments(
+ statecode = "KY",
+ api_key = testkey
+)## [1] "EQ_Assessments: The current query will return 20,254 rows."
+There are 20,254 results for the basic latest cycle Kentucky +query.
+
+KY_subset <- KY_assessments |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(KY_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)It is also possible to design more specific assessment queries. For +example, you might want to narrow down the results to show causes for a +use class in one region. The code below demonstrates querying for EPA +Region 3 Assessments in the ECOLOGICAL_USE group with at least one +parameter with the status “Cause”. The data table below shows a random +sample of the results from this Region 3 Query.
+
+R3_ecological <- rExpertQuery::EQ_Assessments(
+ region = 3,
+ use_group = "ECOLOGICAL_USE",
+ param_status = "Cause",
+ api_key = testkey
+)## [1] "EQ_Assessments: The current query will return 144,650 rows."
+
+R3_ecological_subset <- R3_ecological |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(R3_ecological_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)EQ_AssessmentUnits()queries +ATTAINS Assessment Units data. The default is to only return “Active” +assessment units, but input params can be adjusted to query for +“Retired” or “Historical” assessment units in addition to or instead of +“Active”. The example below shows a simple example for querying for +Active assessment units for a single state.
+
+RI_aus <- rExpertQuery::EQ_AssessmentUnits(
+ statecode = "RI",
+ api_key = testkey
+)## [1] "EQ_AssessmentUnits: The current query will return 1,772 rows."
+We can take a look at a random selection of 20 of the active +assessment units from Rhode Island.
+
+RI_aus_subset <- RI_aus |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(RI_aus_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)It is is also possible to search fo retired assessment units, as show +in the example from Florida below.
+
+FL_retired <- rExpertQuery::EQ_AssessmentUnits(
+ statecode = "FL",
+ au_status = "Retired",
+ api_key = testkey
+)## [1] "EQ_AssessmentUnits: The current query will return 35 rows."
+
+FL_ret_subset <- FL_retired |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(FL_ret_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)EQ_AUsMLs()returns +information on the monitoring locations used to make assessment +determinations at specific assessment units. Not all organizations +provide monitoring location data in ATTAINS. The first example uses a +state that does supply some monitoring location data.
+
+AK_ausmls <- rExpertQuery::EQ_AUsMLs(
+ statecode = "AK",
+ api_key = testkey
+)## [1] "EQ_AUsMLs: The current query will return 844 rows."
+
+While the resulting data set from the query has 844 results, +filtering the data set to retain only records where a monitoring +location id has been recorded leads to a data set with 208 results.
+EQ_CatchCorr() +queries ATTAINS Catchment Correspondance data. These queries return +large files, particularly if a large geospatial area is defined, so best +practice is to make your query as specific as possible. Many machines +may not have enough memory to load an entire state’s worth of catchment +correspondence data in an R session.
+
+DC_catch <- rExpertQuery::EQ_CatchCorr(
+ statecode = "DC",
+ api_key = testkey
+)## [1] "EQ_CatchCorr: The current query will return 2,229 rows."
+## Rows: 2229 Columns: 11
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (7): region, state, organizationType, organizationId, organizationName, ...
+## dbl (4): objectId, catchmentNhdPlusId, reportingCycle, cycleId
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+The table below shows 20 random results from the DC catchment +correspondence query.
+
+DC_subset <- DC_catch |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(DC_subset, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)It is also possible to focus on a specific assessment unit in +catchment correspondence queries. The next example queries for a single +Illinois assessment unit by its id.
+
+IL_N99_catch <- rExpertQuery::EQ_CatchCorr(
+ statecode = "IL",
+ auid = "IL_N-99",
+ api_key = testkey
+)## [1] "EQ_CatchCorr: The current query will return 249 rows."
+The resulting table shows all results from this much smaller +query.
+
+IL_N99_subset <- IL_N99_catch |>
+ dplyr::slice_sample(n = 20)
+
+DT::datatable(IL_N99_catch, options = list(pageLength = 5, scrollX = TRUE), escape = FALSE)EQ_NationalExtract()provides +an efficient method for importing the Expert +Query National Extracts. This function requires one argument, +“extract” to specify which of the National Extracts to download. The +following examples show how to download the TMDL and Actions extracts. +The extracts are all large files and some may not open in R if there is +not enough memory available.
+
+Nat_tmdls <- rExpertQuery::EQ_NationalExtract("tmdl")## [1] "EQ_NationalExtract: downloading Total Maximum Daily Load Profile (Expert Query National Extract). It was last updated on November 07, 2025 at 09:02 PM EST."
+
+Nat_actions <- rExpertQuery::EQ_NationalExtract("actions")## [1] "EQ_NationalExtract: downloading Actions Profile (Expert Query National Extract). It was last updated on November 07, 2025 at 09:02 PM EST."
+EQ_Sources()queries +ATTAINS Sources data. Not all organizations report sources in their +assessments,
+but for those that do, querying by source can be another useful +option for querying ATTAINS data. The next code chunk shows how to query +sources for one state by a parameter group, for example searching for +sources related to the parameter group “HABITAT ALTERATIONS” in +Wisconsin.
+
+WI_habalt_sources <- rExpertQuery::EQ_Sources(
+ statecode = "WI",
+ param_group = "HABITAT ALTERATIONS",
+ api_key = testkey
+)## [1] "EQ_Sources: The current query will return 584 rows."
+There were 584 for this query, which can be reviewed in the data +table below.
+ + +It is also possible to query directly be the name of the source. The +example below queries all organizations for the source +“LEGACY/HISTORICAL POLLUTANTS”.
+
+legacy_sources <- rExpertQuery::EQ_Sources(
+ source = "LEGACY/HISTORICAL POLLUTANTS",
+ api_key = testkey
+)## [1] "EQ_Sources: The current query will return 187 rows."
+There were 584 records for the source “LEGACY/HISTORICAL POLLUTANTS”. +These results can be reviewed in the table below to see which states, +regions, etc. are associated with this source.
+ + + +EQ_TMDLs() +queries ATTAINS TMDL data and can utilize many search parameters. The +example below shows querying Hawaii for TMDLs with a source_type of +“Both”, which means that both point and non point sources were part of +the TMDL.
+
+HI_both_tmdls <- rExpertQuery::EQ_TMDLs(
+ statecode = "HI",
+ source_type = "Both",
+ api_key = testkey
+)## [1] "EQ_TMDL: The current query will return 474 rows."
+There 474 can be reviewed in the data table below.
+ + +This example demonstrates using another combination of arguments in +the query, searching for records from EPA Region 10 that pertain to the +addressed parameter group “CAUSE UNKNOWN”.
+
+R10_unknown <- rExpertQuery::EQ_TMDLs(
+ region = 10,
+ ad_param_group = "CAUSE UNKNOWN",
+ api_key = testkey
+)## [1] "EQ_TMDL: The current query will return 588 rows."
+There 588 are displayed in the table below.
+ + + +EQ_DomainValues() +is a function to help users understand the available domain values can +be used in the other rExpertQuery functions. If run with no argument +supplied, it will return a list of all available domains that can be +used as an argument in EQ_DomainValues().
+
+rExpertQuery::EQ_DomainValues()## [1] "EQ_DomainValues: getting list of available domain names."
+## domain
+## 1 param_attain
+## 2 delist_reason
+## 3 doc_type
+## 4 param_status
+## 5 param_group
+## 6 assess_methods
+## 7 assess_basis
+## 8 param_state_ir_cat
+## 9 act_status
+## 10 org_id
+## 11 org_name
+## 12 au_status
+## 13 act_type
+## 14 act_agency
+## 15 act_agency
+## 16 assess_types
+## 17 water_type
+## 18 statecode
+## 19 source_type
+## 20 use_name
+## 21 use_support
+## 22 file_type
+## 23 ad_param
+## 24 cause
+## 25 param_name
+## 26 loc_type
+## 27 source_scale
+The example below shows how to return allowable values for +“org_id”.
+
+Org_vals <- rExpertQuery::EQ_DomainValues(domain = "org_id")## [1] "EQ_DomainValues: For org_id the values in the code column of the function output are the allowable values for rExpert Query functions."
+
+
+
+