A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://philchalmers.github.io/SimDesign/html/11-Parametric-bootstrap.html below:

Using SimDesign for Parametric Bootstrapping

The parametric bootstrap is a type of Monte Carlo simulation method where data are repeatedly generated and analysed based on characteristics from the original data and a specified population of interest. The reason it may be used is to investigate the sampling distribution of a statistic when no known sampling for exists, but a population generating function is assumed to be known a priori. In this document, we demonstrate how SimDesign can be used for parametric bootstrapping as well, thereby becoming useful when used within R functions designed for analyses.

Example

Say that we were interested in an independent t-test type situation, however we wanted to test whether the medians divided by the inter-quartile range (IQR) would result in a significant effect. As the formula \(|M1 - M2| / IQR\) generally has no known sampling distribution we can attempt to simulate one instead if we can make some reasonable assumptions about the population. For instance, say that the null distribution implies that \(H_0: |M1 - M2| = 0\), and that the population distribution is normally distributed.

median_diff <- function(DV, group){
    meds <- tapply(DV, group, median) 
    IQR <- IQR(DV)
    unname(abs(meds[1] - meds[2]) / IQR)
}
dat <- data.frame(DV = rnorm(100), group = rep(c('Group 1', 'Group 2'), each = 50))
with(dat, median_diff(DV, group))

The question now is: given the observed statistic, is there evidence indicating that 0.013 is significantly greater than 0? Using the tools from SimDesign we could construct a parametric bootstrap method to tests this under the assumption that the population came from a standard normal distribution.

median_parametric_boot <- function(dat, fun, R = 1000, verbose = FALSE, ...){
    
    observed <- with(dat, fun(DV, group))
    
    require(SimDesign)
    Design <- createDesign(N=nrow(dat))
    fixed_objects <- list(fun = fun) #export fun to make it available across nodes
    
    Generate <- function(condition, fixed_objects = NULL) {
        dat <- data.frame(DV = rnorm(condition$N), 
                          group = rep(c('Group 1', 'Group 2'), each = condition$N/2))
        dat
    }

    Analyse <- function(condition, dat, fixed_objects = NULL) {
        ret <- with(dat, fixed_objects$fun(DV, group))
        ret
    }

    # no summarise function provided because design only contains 1 row. Returns a matrix
    res <- runSimulation(design=Design, replications=R, generate=Generate, analyse=Analyse, 
                         fixed_objects=fixed_objects, verbose=verbose, ...)
    
    # empirical p-value
    ret <- c(p = colMeans(observed < res))
    ret
}

Using this function definition we can pass the necessary elements to obtain a parametrically estimate \(p\)-value indicating how likely the observed statistic was to observe given that a suitable empirically generated null hypothesis distribution.

median_parametric_boot(dat, median_diff)
Loading required package: SimDesign

Furthermore, because additional arguments can be passed through the ... input to runSimulation all the extra components available in the package will be present. E.g., defining a parallel cluster and running the code in parallel:

cl <- parallel::makeCluster(2)
median_parametric_boot(dat, median_diff, cl=cl, parallel=TRUE)

The general benefits of SimDesign becomes immediately useful here as well. Consider situations where data must be redrawn during the parametric bootstrap simulation scheme, or when warning messages appear that may suggest issues with the simulation. R’s parallel packages by default will either error-out when stop() messages are thrown (which must be dealt with manually within the user’s code), and information such as warning()s will generally disappear in the output when distributed across many cores (potentially influencing the reliability of the results in unexpected ways). However, as with running general Monte Carlo simulations, runSimulation() will collect all this information in a safe and efficient way, thereby providing the user a safety-blanket when implementing their own parametric bootstrap experiments.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4