Horvitz-Thompson estimator
Type of resources
Keywords
Publication year
Topics
-
Data is currently being used, and reused, in ecological research at unprecedented rates. To ensure appropriate reuse however, we need to ask the question: “Are aggregated databases currently providing the right information to enable effective and unbiased reuse?” We investigate this question, with a focus on designs that purposefully bias the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those that have unequal inclusion probabilities or are stratified. We perform a simulation experiment by creating datasets with progressively more bias, and examine the resulting statistical estimates. The effect of ignoring the survey design can be profound, with biases of up to 250% when naive analytical methods are used. The bias is not reduced by adding more data. Fortunately, the bias can be mitigated by using an appropriate estimator or an appropriate model. These are only applicable however, when essential information about the survey design is available: the randomisation structure (e.g. inclusion probabilities or stratification), and/or covariates used in the randomisation process. The results suggest that such information must be stored and served with the data to support inference and reuse. <b>Citation: </b>S.D. Foster, J. Vanhatalo, V.M. Trenkel, T. Schulz, E. Lawrence, R. Przeslawski, and G.R. Hosack. 2021. Effects of ignoring survey design information for data reuse. Ecological Applications 31(6): e02360. 10.1002/eap.2360