Survey Design Tool (v. 2.0.0)
This R Shiny app allows for the calculation of spatially balanced survey designs of point, linear, or areal resources using the Generalized Random-Tessellation Stratified (GRTS) algorithm, Stevens and Olsen (2004). The Survey Design Tool utilizes functions found within the R package spsurvey: Spatial Sampling Design and Analysis and presents an easy-to-use user interface for many sampling design features including stratification, unequal and proportional inclusion probabilities, replacement (oversample) sites, and legacy (historical) sites. The output of the Survey Design Tool contains sites designed and balanced by user specified inputs and allows the user to export sampling locations as a point shapefile or a flat file. The output also provides design weights which can be used in categorical and continuous variable analyses (i.e., population estimates). The tool also gives the user the ability to adjust initial survey design weights when implementation results in the use of replacement sites or when it is desired to have final weights sum to a known frame size.
This app does not include all possible design options and tools found in the spsurvey package. Please review the package Documentation and Vignettes for more options and details. For further survey discussion and use cases, visit the website for EPAs National Aquatic Resource Surveys (NARS) which are designed to assess the quality of the nation's coastal waters, lakes and reservoirs, rivers and streams, and wetlands using GRTS survey designs. We encourage users to consult with a statistician about your design to prevent design issues and errors.
For Survey Design Tool questions, bugs, feedback, or tool modification suggestions, please contact Garrett Stillings at stillings.garrett@epa.gov. The application code and test datasets are offered at the Survey Design Tool GitHub page. For statistical survey design and analysis, and other technical questions, please contact Michael Dumelle at dumelle.michael@epa.gov.
Vignette
- The Survey Design Tool located on the EPA shiny server is only capable of running designs which use less than 2GB of memory. Please visit the Survey Design Tool GitHub site for the source code to run the app locally for much improved processing.
- The coordinate reference system (CRS) for the sample frame should use an area-preserving projection such as Albers or UTM so that spatial distances are equivalent for all directions. Geographic CRS are not accepted.
- All design attribute variables, such as the Strata and Categories, must be contained in the user's sample frame file. You may run the design without these inputs as an unstratified equal probability design.
- When constructing your design, the user must decide how they want their survey to be designed and which random selection to use:
- Equal Probability Sampling - equal inclusion probability. Selection where all units of the population have the same probability of being selected.
- Stratified Sampling - Selection where the sample frame is divided into non-overlapping strata which independent random samples are calculated.
- Unequal Probability Sampling - unequal inclusion probability. Selection where the chance of being included is calculated relative to the distribution of a categorical variable across the population which does not guarantee a user specified sample size. This type of sampling can give smaller populations a greater chance of being selected.
- Proportional Probability Sampling - proportional inclusion probability. Selection where the chance of being included is proportional to the values of a positive auxiliary variable. For example, if you have many strata in your design, this will ensure each stratum has a sample.
- Select the Sample Frame. Sample frames must be an ESRI shapefile. The user must select all parts of the shapefiles which include .shp, .dbf, .shx. and .prj files (Tip: Hold down ctrl and select each file). The coordinate system for the sample frame must be one where distance for the coordinates is meaningful. The attributes in the file will populate as possible inputs for the design. Maximum size is currently 10GB.
-
Choose your desired Design Type:
- GRTS - Generalized Random Tessellation Stratified. For survey designs desiring spatially balanced samples.
- IRS - Independent Random Sample. For survey designs desiring non-spatially balanced samples.
- Select Strata attribute. If your design is stratified, select the attribute which indicates the desired Strata. If Stratum equals 'None', the design is unstratified. The default is 'None'. Example Strata could be Stream Type (Perennial and Intermittent) or Size (Large and Small).
- Select Category attribute. For an unequal inclusion probability design, select the attribute which indicates the categorical variable which the selection will be based on. Often, the output Category sample sizes will be close, but not exact to the user's sample sizes allocated for each Category. This is because the Category-level sample sizes are random variables. The default is 'None'. An example Category could be stream order or elevation (high/low).
- Additional design attributes such as Auxiliary Variables, Reproducible Seed, DesignID, Minimum Distance, Maximum Attempts, Point Density, and Nearest Neighbor Replacement Sites are also available. Descriptions of these inputs can be found in the grts section on the spsurvey manual as well as the helper buttons next to the inputs.
- Legacy sites are sites that have been selected in a previous probability sample and are to be automatically included in the current probability sample.
- Upload a POINT sample frame which contains the Legacy sites you would like included in the design. All sites in the legacy file will be considered legacy sites.
- If your Legacy sample frame has different Strata, Category or Auxiliary variable names than your design sample frame, select the corresponding attribute(s) from the legacy sample frame. These inputs will not appear if the names match your design sample frame.
- The number of legacy sites must be greater than number of base sites in at least one stratum.
Requirements
Optional: Additional Design Attributes
Legacy Sampling
- Select a spatially balanced survey using the spatial balance metrics provided. Typically, estimates from spatially balanced surveys are more precise (vary less) than estimates from non-spatially balanced surveys.
- Consider what will be measured in the survey. If you anticipate the parameter of interest to result in low variation across the survey, a smaller sample size can yield a low margin of error estimate. Conversely, if you anticipate the parameter of interest to result in high variation, you should consider increasing the sample size to account for a higher margin of error.
- Allocate additional sampling time to survey extra sites if needed. When designing the survey, be sure to generate replacement sites to use for oversampling.
- For unstratified equal probability designs, set the desired Base site sample size.
- If you supplied a Stratum attribute, a tab is populated for each Stratum of the design.
- Set the sample size of Base sites you desire for each stratum.
- If you supplied a Category attribute, these categories will automatically populate. Choose the sample sizes for each. NOTICE: the sum of the sample sizes must equal the base site sample size.
- Choose the sample size of the Replacement Sites you desire, if any. Replacement sites are an additional set of sites that can be used to replace the main sample list sites when they are found to be non-target or inaccessible. When replacing a site with a replacement, the user must FOLLOW THE ORDER of the design output and select a replacement site of the same Stratum, if used. If replacement sites are used improperly it may result in spatial imbalance. The tool attempts to distribute the replacement sites proportionately among sample sizes for the Categories. If the replacement proportion for one or more Categories is not a whole number, the proportion is rounded to the next higher integer. Choose a reasonable replacment sample size as requesting too many unused sites can impact the spatial balance of your design.
- Once your design has been prepared, click the 'Calculate Survey Design' button to be transported to the Survey Design Results tab.
Setting an appropriate sample size and considering how they should be allocated across a sample frame is a fundamental step in designing a successful survey. Many surveys, especially those used for environmental monitoring, are limited by budgetary and logistical constraints. The designer must determine a sample size which can overcome these constraints while ensuring the survey estimates the parameter(s) of interest with a low margin of error. The designer can consider a few elements when determining a survey sample size:
To aid the user, in the 'Survey Design tab' simulated population estimates using the local neighborhood variance estimator (uses a site's nearest neighbors to estimate variance, tending to result in smaller variance values) and will be calculated using the users defined sample sizes. This can give the user insight on the survey estimates potential margin of error if the sample size(s) chosen is used.
- The process of calculating your Survey Design can take a while. The spinner will stop when your Survey Design is complete. If you have errors in your Design inputs, a message with the error will be displayed under 'Design Errors'.
- A table of your Survey Design will appear if successful. A table will be displayed with totals of your sample sizes allocated across strata and categories, if used.
- The Population Estimate Simulation module can give the user insight on the survey estimates potential margin of error if the input sample size(s) are used. Condition classes assigned to each site are randomly selected using user specified probability weights. Typically, margin of error will decrease if the condition class distribution is unequally distributed. The user can choose the number of condition classes used, modify the probability of being selected, and refresh the simulation to view different condition scenarios. The user can adjust the sample size and refresh the design to determine an appropriate margin of error for the survey.
- Choose a Spatial Balance Metric. All spatial balance metrics provided have a lower bound of zero, which indicates perfect spatial balance. As the metric value increases, the spatial balance decreases. This is useful in comparing survey designs.
- Click the 'Download Survey Design' button to download a zip file which contains a POINT shapefile of your designs survey sample sites, the users sample frame, and README which includes information about your design.
- A table of the users Probability Survey Site Results is presented for review. Please note the Lat/Longs are transformed to WGS84 coordinate system. The xcoord and ycoord are Conus Albers (a projected CRS) coordinates which is an area-preserving projection. These coordinates can be used for the local neighborhood variance estimator when calculating population estimates.
- The Survey Map tab provides an interactive and static map of the sample frame and the survey sample sites.
Survey Design
Survey Map
Adjusting initial survey design weights is necessary when implementation results in the use of replacement sites or when it is desired to have final weights sum to known frame size of the desired population. This includes samples that are smaller or larger than planned, instances where an oversample is used, or samples impacted by frame error or nonresponse error. Adjusted weights are equal to initial weight * framesize/sum(initial weights). The adjustment is done separately for each Category specified in Weighting Category input. The tool allows the user to manually enter a desired population Frame Size or an automated calculation of the frame size by totaling the initial weights and adjusting it by the users site Evaluation Status inputs. By using the automated method, the output will render two adjusted weights:
- WGT_TP_EXTENT - Weights based on the evaluation of all target and non-target probability sites. These weights are only used to estimate extent for target and non-target populations.
- WGT_TP_CORE - Weights based on the evaluation of the target population based on sampled probability sites. These weights can be used to estimate condition for the 'target population'. Current NARS population estimates only use WGT_TP_CORE for all estimates related to condition.
Weights Adjustment Example (Non-response)
- Upload the file which contains the required weight adjustment inputs. See below for the descriptions of each input.
- Select the column which has the initial unadjusted weights for each site.
- Select the column which contains the Site Evaluation Attributes which categorically evaluate which sites are target-sampled, non-response (not sampled) and non-target (not sampled) sites.
- Select the attribute(s) which indicate if the site was a Target site (Base and Replacement sites) and has been sampled. If available, this input should include additional Replacement sites which were added to the design and not used as replacement.
- Select the attribute(s) which indicate if the site was a Non-Response site and was not sampled (e.g. Landowner Denials, Inaccessible, Target-Not Sampled). Non-target sites should NOT be included in this input.
- Select the Weighting Category column. A weight adjustment category represents if a Stratum and/or a multi-density category was used in the design as implemented. If the design was unequally stratified, this attribute should contain a combination of the stratum and category used (i.e. Stratum-Category). The default is all sites are in the same category, which assumes every site is in the same category and an equal probability design is being adjusted.
- Input the initial sample frame size(s). Based on if you entered a weighting category, a frame size input for each weight category will be generated.
- Press the 'Calculate Adjusted Survey Weights' button for the adjusted weight output.
Weight Adjustment File Setup Examples
Equal Probability Design | ||
---|---|---|
SiteID | Weight | Site Evaluation |
Site_01 | 2 | Target-Sampled |
Site_02 | 2 | Non-Target |
Site_02_Replace | 2 | Target-Sampled |
Site_03 | 2 | Access_Denied |
Site_03_Replace | 2 | Target-Sampled |
Site_04 | 2 | Target-Sampled |
Unequal Probability Design | |||
---|---|---|---|
SiteID | Category | Weight | Site Evaluation |
Site_01 | 1st Order | 2 | Target-Sampled |
Site_02 | 2nd Order | 3 | Non-Target |
Site_02_Replace | 2nd Order | 3 | Target-Sampled |
Site_03 | 3rd Order | 4 | Access_Denied |
Site_03_Replace | 3rd Order | 4 | Target-Sampled |
Site_04 | 1st Order | 2 | Target-Sampled |
Weight Adjustment Inputs
Citation
If you have used the Survey Design Tool to generate a survey used in publication or reporting, please reference the tool URL (https://owshiny.epa.gov/survey-design-tool/) and cite the spsurvey package.
Disclaimer
The United States Environmental Protection Agency (EPA) Survey Design tool and code is provided on an "as is" basis and the user assumes responsibility for its use. EPA has relinquished control of the information and no longer has responsibility to protect the integrity, confidentiality, or availability of the information. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not constitute or imply their endorsement, recommendation or favoring by EPA. The EPA seal and logo shall not be used in any manner to imply endorsement of any commercial product or activity by EPA or the United States Government.