Overview
This Shiny app allows for calculation of population estimates as performed for the National Aquatic Resource Surveys (NARS) and the plotting of results. Estimates based on categorical and continuous variables are possible. This app does not include all possible options but does allow typical settings used by NARS for creating population estimates.
Instructions for Use
- Select data file and upload. If the data are to be loaded from a URL, check the box to do so and paste or enter the URL for the file.
- The variables in that file will populate dropdown lists on that tab.
- Select variables to serve as site IDs, weights, response variables, and subpopulations (if desired). If only overall or 'national' estimates are desired, check the box for overall analysis.
- If data are to be used for change analysis, select the variable that distinguishes between design cycles (we usually assume this variable represents year).
-
Select the type of variance you want to use. Local neighborhood variance uses a site's nearest neighbors to estimate variance, tending to result in smaller variance values than variance based on a simple random sample. This approach is recommended and is the approach used in NARS estimates. It requires site coordinates to be provided.
- For local neighborhood variance, select coordinate variables (Albers projection, or some other projection).
- For simple random sample (SRS) variance, selecting a stratum variable to better estimate variance is advised but not required. Coordinates are not used with this type of variance.
- You may subset the data for analysis by up to one categorical variable. To do this, select the check box to subset, then select the variable to subset by. Finally, select one or more categories by which to subset data.
- Click on the left hand button to view the full dataset if necessary.
- Click on the right hand button above the data to subset the data before proceeding to the Run Population Estimates tab.
- The R package spsurvey, v.5.0 or later is required. Be sure to update this package if an older version is already installed.
- All variables must be contained in one file and include site IDs, weights, response variables, subpopulations (if any), and optionally, coordinates and/or design stratum (depending on type of variance desired).
- All sites included in the dataset should have weight > 0. Any records with a missing weight or a weight of 0 will be dropped before analysis.
- Input data should include only one row per site and year/survey cycle (based on the variables for site ID and year/survey cycle selected). No within year revisits should be included, and all variables used in analysis should be separate columns in the dataset (i.e., wide format).
- Only delimited files, such as comma- and tab-delimited, are accepted for upload.
- If local neighborhood variance is desired, coordinates must be provided in some type of projection, such as Albers.
- If variance based on a simple random sample is desired (or if coordinates are not available), the design stratum should be provided to better estimate variance.
- If change analysis is intended, all desired years of data must be contained in one file, with a single variable that identifies the individual years or survey cycles included.
- Select the type of analysis (categorical or continuous).
- Select the confidence level for estimating confidence intervals (90% or 95%). The default is 95%.
- If year or design cycle variable was selected on the Prepare Data for Analysis tab, select year or cycle of interest.
- For continuous analysis, select either CDFs (cumulative distribution functions), percentiles, means, or totals.
- Note that if data are missing for continuous variables, those sites are ignored in analysis.
- Click on the Run/Refresh Estimates button. Depending on the number of responses, subpopulations, and type of analysis, it may take a few seconds to several minutes.
- If desired, download results to a comma-delimited file by clicking the Save Results button.
-
Outputs for categorical analysis:
-
Outputs for continuous analysis:
- First select the two years (or sets of years) to compare.
- Select type of data to analyze (categorical or continuous).
- Select the confidence level for estimating confidence intervals (90% or 95%). The default is 95%.
- If continuous data are selected, select parameter on which to test for differences (mean or median).
- Click on the Run/Refresh Estimates button. Depending on the number of responses, subpopulations, and type of analysis, it may take a few seconds to several minutes.
- If any data are changed in the Prepare Data for Analysis tab, years must be re-selected before running analysis.
-
Outputs for Categorical Analysis:
- Survey_1 = Year or design cycle of first survey
- Survey_2 = Year or design cycle of second survey
- Type = Subpopulation group
- Subpopulation = Subpopulation name
- Indicator = Name of indicator
- Category = Category of indicator or Total
- DiffEst.P = Estimate of difference in percentage (Survey_2 - Survey_1)
- StdError.P = Estimated standard error of change percent estimate
- MarginofError.P = Margin of error of change percent estimate, representing difference between estimate and confidence bounds
- LCBXXPct.P = Lower confidence bound for change percent, where XX represents the confidence level
- UCBXXPct.P = Upper confidence bound for change percent, where XX represents the confidence level
- DiffEst.U = Estimated amount of change in resource in category in same units as weights used
- StdError.U = Estimated standard error of amount of change in resource estimate
- MarginofError.U = Margin of error of amount of change in resource estimate, representing difference between estimate and confidence bounds
- LCBXXPct.U = Lower confidence bound for amount of change resource, where XX represents the confidence level
- UCBXXPct.U = Upper confidence bound for amount of change resource, where XX represents the confidence level
- nResp_1, nResp_2 = Number of responses in category in survey 1 and survey 2, respectively
- Estimate.P_1, Estimate.P_2 = Estimated percent of resource in category in survey 1 and survey 2, respectively
- StdError.P_1, StdError.P_2 = Estimated standard error of percent estimate in survey 1 and survey 2, respectively
- MarginofError.P_1, MarginofError.P_2 = Margin of error of percent estimate, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
- LCBXXPct.P_1, LCBXXPct.P_2 = Lower confidence bound for percent, where XX represents the confidence level in survey 1 and survey 2, respectively
- UCBXXPct.P_1, UCBXXPct.P_2 = Upper confidence bound for percent, where XX represents the confidence level in survey 1 and survey 2, respectively
- Estimate.U_1, Estimate.U_2 = Estimated amount of resource in category in same units as weights used, in survey 1 and survey 2, respectively
- StdError.U_1, StdError.U_2 = Estimated standard error of amount of resource estimate in survey 1 and survey 2, respectively
- MarginofError.U_1, MarginofError.U_2 = Margin of error of amount of resource estimate, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
- LCBXXPct.U_1, LCBXXPct.U_2 = Lower confidence bound for amount of resource, where XX represents the confidence level in survey 1 and survey 2, respectively
- UCBXXPct.U_1, UCBXXPct.U_2 = Upper confidence bound for amount of resource, where XX represents the confidence level in survey 1 and survey 2, respectively
-
Outputs for Continuous Analysis (Means):
- Survey_1 = Year or design cycle of first survey
- Survey_2 = Year or design cycle of second survey
- Type = Subpopulation group
- Subpopulation = Subpopulation name
- Indicator = Name of indicator
- DiffEst = Estimate of difference in mean for indicator (Survey_2 - Survey_1)
- StdError = Estimated standard error of change estimate
- MarginofError = Margin of error of estimated mean, representing difference between estimate and confidence bounds
- LCBXXPct = Lower confidence bound for estimated mean, where XX represents the confidence level
- UCBXXPct = Upper confidence bound for estimated mean, where XX represents the confidence level
- nResp_1, nResp_2 = Number of responses for indicator in survey 1 and survey 2, respectively
- Estimate_1, Estimate_2 = Estimated mean of resource for indicator in survey 1 and survey 2, respectively
- StdError_1, StdError_2 = Estimated standard error of estimated mean in survey 1 and survey 2, respectively
- MarginofError_1, MarginofError_2 = Margin of error of estimated mean, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
- LCBXXPct_1, LCBXXPct_2 = Lower confidence bound for indicator mean, where XX represents the confidence level in survey 1 and survey 2, respectively
- UCBXXPct_1, UCBXXPct_2 = Upper confidence bound for indicator mean, where XX represents the confidence level in survey 1 and survey 2, respectively
-
Outputs for Continuous Analysis (Medians):
- All of the output variable names match those for Categorical Change Analysis (see above)
- The two categories for this output are Greater_Than_Median and Less_Than_Median, but the interpretations are the same
- Either run population estimates on categorical data either within the app or import results into the app.
- Variables in dataset must match those expected as output from spsurvey::cat_analysis function: Type, Subpopulation, Indicator, Category, Estimate.P, StdError.P, LCBXXPct.P, UCBXXPct.P, Estimate.U, StdError.U, LCBXXPct.U, UCBXXPct.U, where XX represents the confidence level.
- Select either proportion or unit estimates to plot from Estimate Type.
- Select Category values that represent Good, Fair, Poor, Not Assessed, and Other condition classes. More than one value per condition class in the dataset may be selected. For example, if one response uses Good/Fair/Poor and another used At or Below Benchmark/Above Benchmark, both Good and At or Below Benchmark can be selected.
- Optional: add plot title and define resource type/unit (i.e., stream length, number of lakes, coastal or wetland area)
- Click the Plot/Refresh Button to create plots.
- From the menus on the right-hand side of the page, select the Indicator of interest, and then the Subpopulation Group. Then select the Subpopulation of interest. The upper plot shows the individual subpopulation and the lower plot show a particular condition class across all subpopulations.
- To show confidence bound values, click the box above the main and/or subpopulation plots.
- The default order of the subpopulations in the lower plot is alphabetical, but to sort by the estimate of the Good class, click the box for Sort Subpopulations by Good Condition .
- Select Download Estimate Plot or Download Subpopulation Plot to save a .png file of the output.
- Either run population estimates on continuous data to obtain CDF estimates within the app, or import results into the app.
- Variables in dataset must match those expected as output from spsurvey::cont_analysis function: Type, Subpopulation, Indicator, Value, Estimate.P, StdError.P, LCBXXPct.P, UCBXXPct.P, Estimate.U, StdError.U, LCBXXPct.U, UCBXXPct.U, where XX represents the confidence level.
- Select either proportion or unit estimates to plot from Estimate Type.
- Optional: Add plot title, indicator units, and define resource type/unit.
- Click the Plot Continuous Estimates button.
- Select indicator from dropdown, then select a subpopulation group. Add or remove subpopulations to the plot from the Add/Remove Subpopulations dropdown.
- Optional: Add an Indicator Threshold, add confidence bands, and/or change the x-axis to log10 scale. Be aware that if you have values that are below or equal to zero, points will be excluded from the plot if the Log Scale X-Axis option is selected.
Contact Karen Blocksom at blocksom.karen@epa.gov with questions or feedback.
Disclaimer
The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.
If ANY changes have been made to your choices, you MUST click the button to prepare data for analysis again!
Data for Analysis
If the Run/Refresh Estimates button is grayed out, return to the Prepare Data for Analysis tab and click the button that says Click HERE to prepare data for analysis
Note that if all values are very small, the results may appear as zeroes. Save and view output file to see the results with full digits.
Save Results as .csv file Save version info and the R code used for analysis
If output is not as expected, be sure you chose the correct Type of Analysis (Categorical or Continuous) for your data
Warnings
Analysis Output
If a different set of response variables from those used in the population estimates is desired, return to the Prepare Data for Analysis tab to re-select variables. Then click the button to prepare data for analysis again.
If the Run/Refresh Estimates button is grayed out, return to the Prepare Data for Analysis tab and click the button that says Click HERE to prepare data for analysis
Save Change Results as .csv file Save version info and the R code used for analysis
Warnings
Change Analysis Output
Categorical Estimates by Population
Subpopulation Comparison
CDF Estimates
NOTE: Plotting and downloading may take a while if there are multiple subpopulations. PLEASE BE PATIENT.
Distribution of Estimates by Population
Contact Us to ask a question, provide feedback, or report a problem.