UP | HOME

blocking

1. example

  • you want to measure the effect of fertilizer on growth. You have four fields, each at a different elevation. If you randomly selected which fields to fertilize, your estimation of the effect will include variance that is due to elevation. So instead you should apply fertilizer/no fertilizer at each elevation and estimate the growth per elevation

2. basic idea

  • you have a factor and you want to measure the response to that factor
  • but there could be other factors confounding that response (see confounds)
  • so, if you can identify the confound, then you can group your experiments into blocks where variance, should hopefully only be due to your primary factor
  • what if there's other confounds within the block – "lurking" confounds
    • whenever we can't do blocking, we just need to randomize and hope for the best

3. questions

  • I get the basic idea of blocking, but I'm not sure what the actual result of this analysis is supposed to be. Am I supposed to report the estimation per block? Or am I supposed to treat each block estimate as an experiment and then estimate using the block estimates?
  • If it's the former: seems like things would get complicated with things you block
  • If it's the latter:
    • Then aren't you re-including the variance due to the nuisance factor that you tried to block for?
    • Response to the above: No, because the experiments are uniformly divided among the nuisance factor's categories. Aggregating over the block estimates allows you to get an estimate of the effect on the population as a whole.
    • But, if the nuisance variable has a strong effect, then you're probably going to want to look at the non-aggregated results anyways at somepoint.

4. statistical model

On the question I ask above, wikipedia gives the following statistical model for one primary factor and one nuisance factor: \[ Y_{ij} = \mu + T_i + B_j + \epsilon \] where

  • \(\epsilon\) is random error
  • \(Y_{ij}\) is some observation with primary factor \(i\) and nuisance factor \(j\)
  • \(\mu\) is the overall average
  • \(T_i\) is the effect of \(i\)
  • \(B_j\) is the effect of the nuisance \(j\)

Then, our estimations will be:

  • estimate of \(\mu\): \(\bar{Y}\) – average of all observations
  • estimtae of \(T_i\): \(\bar{Y}_{i.} - \bar{Y}\) where \(\bar{Y}_{i.}\) is the average over \(\forall j . Y_{ij}\)
    • we want to know how much better the effect is, conditioned on \(i\)
    • Question: will the estimated \(T_i\) be a good estimate if \(Y_{ij}\) are not uniformly distributed across all \(j\)? I'm pretty sure no, otherwise, we would be misled by confounds. Ah I see Wikipedia says that this is the statistical model for randomized block design. I think this means that you assume the observations are uniformly distributed across all \(j\), for each primary factor setting \(i\).

5. sources

Created: 2024-07-15 Mon 01:28