Population Estimate Tool

Overview

This Shiny app allows for calculation of population estimates as performed for the National Aquatic Resource Surveys (NARS) and the plotting of results. Estimates based on categorical and continuous variables are possible. This app does not include all possible options but does allow typical settings used by NARS for creating population estimates.

Instructions for Use

Prepare Data for Analysis

Select data file and upload. If the data are to be loaded from a URL, check the box to do so and paste or enter the URL for the file.
The variables in that file will populate dropdown lists on that tab.
Select variables to serve as site IDs, weights, response variables, and subpopulations (if desired). If only overall or 'national' estimates are desired, check the box for overall analysis.
If data are to be used for change analysis, select the variable that distinguishes between design cycles (we usually assume this variable represents year).
Select the type of variance you want to use. Local neighborhood variance uses a site's nearest neighbors to estimate variance, tending to result in smaller variance values than variance based on a simple random sample. This approach is recommended and is the approach used in NARS estimates. It requires site coordinates to be provided.
- For local neighborhood variance, select coordinate variables (Albers projection, or some other projection).
- For simple random sample (SRS) variance, selecting a stratum variable to better estimate variance is advised but not required. Coordinates are not used with this type of variance.

You may subset the data for analysis by up to one categorical variable. To do this, select the check box to subset, then select the variable to subset by. Finally, select one or more categories by which to subset data.
Click on the left hand button to view the full dataset if necessary.
Click on the right hand button above the data to subset the data before proceeding to the Run Population Estimates tab.

Minimum Requirements for Analysis

The R package spsurvey, v.5.0 or later is required. Be sure to update this package if an older version is already installed.
All variables must be contained in one file and include site IDs, weights, response variables, subpopulations (if any), and optionally, coordinates and/or design stratum (depending on type of variance desired).
All sites included in the dataset should have weight > 0. Any records with a missing weight or a weight of 0 will be dropped before analysis.
Input data should include only one row per site and year/survey cycle (based on the variables for site ID and year/survey cycle selected). No within year revisits should be included, and all variables used in analysis should be separate columns in the dataset (i.e., wide format).
Only delimited files, such as comma- and tab-delimited, are accepted for upload.
If local neighborhood variance is desired, coordinates must be provided in some type of projection, such as Albers.
If variance based on a simple random sample is desired (or if coordinates are not available), the design stratum should be provided to better estimate variance.
If change analysis is intended, all desired years of data must be contained in one file, with a single variable that identifies the individual years or survey cycles included.

Run Population Estimates

Select the type of analysis (categorical or continuous).
Select the confidence level for estimating confidence intervals (90% or 95%). The default is 95%.
If year or design cycle variable was selected on the Prepare Data for Analysis tab, select year or cycle of interest.
For continuous analysis, select either CDFs (cumulative distribution functions), percentiles, means, or totals.
Note that if data are missing for continuous variables, those sites are ignored in analysis.
Click on the Run/Refresh Estimates button. Depending on the number of responses, subpopulations, and type of analysis, it may take a few seconds to several minutes.
If desired, download results to a comma-delimited file by clicking the Save Results button.

Outputs for categorical analysis:

Type = Subpopulation group

Subpopulation = Subpopulation name

Indicator = Name of indicator

Category = Category of indicator or Total

nResp = Number of responses in category

Estimate.P = Estimated percent of resource in category

StdError.P = Estimated standard error of percent estimate

MarginofError.P = Margin of error of percent estimate, representing difference between estimate and confidence bounds

LCBXXPct.P = Lower confidence bound for percent, where XX represents the confidence level

UCBXXPct.P = Upper confidence bound for percent, where XX represents the confidence level

Estimate.U = Estimated amount of resource in category in same units as weights used

StdError.U = Estimated standard error of amount of resource estimate

MarginofError.U = Margin of error of amount of resource estimate, representing difference between estimate and confidence bounds

LCBXXPct.U = Lower confidence bound for amount of resource, where XX represents the confidence level

UCBXXPct.U = Upper confidence bound for amount of resource, where XX represents the confidence level

Outputs for continuous analysis:

Type = Subpopulation group

Subpopulation = Subpopulation name

Indicator = Name of indicator

Value = Value of indicator (CDF only)

Statistic = Value of indicator (Percentiles only)

nResp = Number of responses in category

Estimate.P = Estimated percent of resource at or below value (CDF only)

Estimate = Estimated value for given percentile (Percentiles only)

StdError.P = Estimated standard error of percent estimate (CDF only)

StdError = Estimated standard error of mean, variance, or standard deviation estimate (Percentiles only)

LCBXXPct.P = Lower confidence bound for percent, where XX represents the confidence level (CDF only)

UCBXXPct.P = Upper confidence bound for percent, where XX represents the confidence level (CDF only

LCBXXPct = Lower confidence bound for percentile estimate, where XX represents the confidence level (Percentiles only)

UCBXXPct = Upper confidence bound for percentile estimate, where XX represents the confidence level (Percentiles only)

Estimate.U = Estimated amount of resource at or below value in same units as weights used (CDF only)

StdError.U = Estimated standard error of amount of resource at or below estimate (CDF only)

LCBXXPct.U = Lower confidence bound for amount of resource, where XX represents the confidence level (CDF only)

UCBXXPct.U = Upper confidence bound for amount of resource, where XX represents the confidence level (CDF only)

Run Change Analysis

First select the two years (or sets of years) to compare.
Select type of data to analyze (categorical or continuous).
Select the confidence level for estimating confidence intervals (90% or 95%). The default is 95%.
If continuous data are selected, select parameter on which to test for differences (mean or median).
Click on the Run/Refresh Estimates button. Depending on the number of responses, subpopulations, and type of analysis, it may take a few seconds to several minutes.
If any data are changed in the Prepare Data for Analysis tab, years must be re-selected before running analysis.

Outputs for Categorical Analysis:

Survey_1 = Year or design cycle of first survey
Survey_2 = Year or design cycle of second survey
Type = Subpopulation group
Subpopulation = Subpopulation name
Indicator = Name of indicator
Category = Category of indicator or Total
DiffEst.P = Estimate of difference in percentage (Survey_2 - Survey_1)
StdError.P = Estimated standard error of change percent estimate
MarginofError.P = Margin of error of change percent estimate, representing difference between estimate and confidence bounds
LCBXXPct.P = Lower confidence bound for change percent, where XX represents the confidence level
UCBXXPct.P = Upper confidence bound for change percent, where XX represents the confidence level
DiffEst.U = Estimated amount of change in resource in category in same units as weights used
StdError.U = Estimated standard error of amount of change in resource estimate
MarginofError.U = Margin of error of amount of change in resource estimate, representing difference between estimate and confidence bounds
LCBXXPct.U = Lower confidence bound for amount of change resource, where XX represents the confidence level
UCBXXPct.U = Upper confidence bound for amount of change resource, where XX represents the confidence level
nResp_1, nResp_2 = Number of responses in category in survey 1 and survey 2, respectively
Estimate.P_1, Estimate.P_2 = Estimated percent of resource in category in survey 1 and survey 2, respectively
StdError.P_1, StdError.P_2 = Estimated standard error of percent estimate in survey 1 and survey 2, respectively
MarginofError.P_1, MarginofError.P_2 = Margin of error of percent estimate, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
LCBXXPct.P_1, LCBXXPct.P_2 = Lower confidence bound for percent, where XX represents the confidence level in survey 1 and survey 2, respectively
UCBXXPct.P_1, UCBXXPct.P_2 = Upper confidence bound for percent, where XX represents the confidence level in survey 1 and survey 2, respectively
Estimate.U_1, Estimate.U_2 = Estimated amount of resource in category in same units as weights used, in survey 1 and survey 2, respectively
StdError.U_1, StdError.U_2 = Estimated standard error of amount of resource estimate in survey 1 and survey 2, respectively
MarginofError.U_1, MarginofError.U_2 = Margin of error of amount of resource estimate, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
LCBXXPct.U_1, LCBXXPct.U_2 = Lower confidence bound for amount of resource, where XX represents the confidence level in survey 1 and survey 2, respectively
UCBXXPct.U_1, UCBXXPct.U_2 = Upper confidence bound for amount of resource, where XX represents the confidence level in survey 1 and survey 2, respectively

Outputs for Continuous Analysis (Means):

Survey_1 = Year or design cycle of first survey
Survey_2 = Year or design cycle of second survey
Type = Subpopulation group
Subpopulation = Subpopulation name
Indicator = Name of indicator
DiffEst = Estimate of difference in mean for indicator (Survey_2 - Survey_1)
StdError = Estimated standard error of change estimate
MarginofError = Margin of error of estimated mean, representing difference between estimate and confidence bounds
LCBXXPct = Lower confidence bound for estimated mean, where XX represents the confidence level
UCBXXPct = Upper confidence bound for estimated mean, where XX represents the confidence level
nResp_1, nResp_2 = Number of responses for indicator in survey 1 and survey 2, respectively
Estimate_1, Estimate_2 = Estimated mean of resource for indicator in survey 1 and survey 2, respectively
StdError_1, StdError_2 = Estimated standard error of estimated mean in survey 1 and survey 2, respectively
MarginofError_1, MarginofError_2 = Margin of error of estimated mean, representing difference between estimate and confidence bounds in survey 1 and survey 2, respectively
LCBXXPct_1, LCBXXPct_2 = Lower confidence bound for indicator mean, where XX represents the confidence level in survey 1 and survey 2, respectively
UCBXXPct_1, UCBXXPct_2 = Upper confidence bound for indicator mean, where XX represents the confidence level in survey 1 and survey 2, respectively

Outputs for Continuous Analysis (Medians):

All of the output variable names match those for Categorical Change Analysis (see above)
The two categories for this output are Greater_Than_Median and Less_Than_Median, but the interpretations are the same

Plot Categorical Estimates

Either run population estimates on categorical data either within the app or import results into the app.
Variables in dataset must match those expected as output from spsurvey::cat_analysis function: Type, Subpopulation, Indicator, Category, Estimate.P, StdError.P, LCBXXPct.P, UCBXXPct.P, Estimate.U, StdError.U, LCBXXPct.U, UCBXXPct.U, where XX represents the confidence level.
Select either proportion or unit estimates to plot from Estimate Type.
Select Category values that represent Good, Fair, Poor, Not Assessed, and Other condition classes. More than one value per condition class in the dataset may be selected. For example, if one response uses Good/Fair/Poor and another used At or Below Benchmark/Above Benchmark, both Good and At or Below Benchmark can be selected.
Optional: add plot title and define resource type/unit (i.e., stream length, number of lakes, coastal or wetland area)
Click the Plot/Refresh Button to create plots.
From the menus on the right-hand side of the page, select the Indicator of interest, and then the Subpopulation Group. Then select the Subpopulation of interest. The upper plot shows the individual subpopulation and the lower plot show a particular condition class across all subpopulations.
To show confidence bound values, click the box above the main and/or subpopulation plots.
The default order of the subpopulations in the lower plot is alphabetical, but to sort by the estimate of the Good class, click the box for Sort Subpopulations by Good Condition .
Select Download Estimate Plot or Download Subpopulation Plot to save a .png file of the output.

Plot Continuous Estimates

Either run population estimates on continuous data to obtain CDF estimates within the app, or import results into the app.
Variables in dataset must match those expected as output from spsurvey::cont_analysis function: Type, Subpopulation, Indicator, Value, Estimate.P, StdError.P, LCBXXPct.P, UCBXXPct.P, Estimate.U, StdError.U, LCBXXPct.U, UCBXXPct.U, where XX represents the confidence level.
Select either proportion or unit estimates to plot from Estimate Type.
Optional: Add plot title, indicator units, and define resource type/unit.
Click the Plot Continuous Estimates button.
Select indicator from dropdown, then select a subpopulation group. Add or remove subpopulations to the plot from the Add/Remove Subpopulations dropdown.
Optional: Add an Indicator Threshold, add confidence bands, and/or change the x-axis to log10 scale. Be aware that if you have values that are below or equal to zero, points will be excluded from the plot if the Log Scale X-Axis option is selected.

Contact Karen Blocksom at blocksom.karen@epa.gov with questions or feedback.

Disclaimer

The United States Environmental Protection Agency (EPA) GitHub project code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.

Type of Analysis (pick one)

Categorical (for character variables)

Continuous (for numeric variables)

Show CDF, percentile, mean, or total results

CDF

Percentiles

Means

Totals

Select the year for analysis

Confidence level

90%

95%

If the Run/Refresh Estimates button is grayed out, return to the Prepare Data for Analysis tab and click the button that says Click HERE to prepare data for analysis

Note that if all values are very small, the results may appear as zeroes. Save and view output file to see the results with full digits.

Save Results as .csv file Save version info and the R code used for analysis

If output is not as expected, be sure you chose the correct Type of Analysis (Categorical or Continuous) for your data

Warnings

Analysis Output

If a different set of response variables from those used in the population estimates is desired, return to the Prepare Data for Analysis tab to re-select variables. Then click the button to prepare data for analysis again.

Select two years of data to compare in desired order

Type of variables to analyze

Categorical

Continuous

Base test on mean or median

Mean

Median

Confidence level

90%

95%

If the Run/Refresh Estimates button is grayed out, return to the Prepare Data for Analysis tab and click the button that says Click HERE to prepare data for analysis

Save Change Results as .csv file Save version info and the R code used for analysis

Warnings

Change Analysis Output

Select Indicator

Select Subpopulation Group

Categorical Estimates by Population

Select Subpopulation

Add Confidence Limit Values

Download Estimate Plot

Subpopulation Comparison

Select Condition

Sort Subpopulations by 'Good' Condition

Add Confidence Limit Values

Download Subpopulation Plot

Select Indicator

Select Population

Add/Remove Subpopulations

Indicator Threshold (optional)

Add Confidence Limits

Log Scale X-Axis

CDF Estimates

NOTE: Plotting and downloading may take a while if there are multiple subpopulations. PLEASE BE PATIENT.

Download CDF Plot

Distribution of Estimates by Population

Download Distribution Plot

Overview

Instructions for Use

Prepare Data for Analysis

Prepare Data for Analysis

Minimum Requirements for Analysis

Minimum Requirements for Analysis

Run Population Estimates

Run Population Estimates

Run Change Analysis

Run Change Analysis

Plot Categorical Estimates

Plot Categorical Estimates

Plot Continuous Estimates

Plot Continuous Estimates

Disclaimer

If ANY changes have been made to your choices, you MUST click the button to prepare data for analysis again!

Data for Analysis

If output is not as expected, be sure you chose the correct Type of Analysis (Categorical or Continuous) for your data

Warnings

Analysis Output

If a different set of response variables from those used in the population estimates is desired, return to the Prepare Data for Analysis tab to re-select variables. Then click the button to prepare data for analysis again.

Warnings

Change Analysis Output

Categorical Estimates by Population

Subpopulation Comparison

CDF Estimates

NOTE: Plotting and downloading may take a while if there are multiple subpopulations. PLEASE BE PATIENT.

Distribution of Estimates by Population