This is the fourth exercise in the series.

Exact replication of the imputations and figures in this exercise depends on mice version > 3.2.1. The latest version of mice can be obtained from:

devtools::install_github("stefvanbuuren/mice")

If you have an earlier mice version, the imputations and resulting plots may slightly differ. The overall train of thought for this document for all mice versions is equivalent.

In this exercise we will walk you through the more advanced features of mice, such as post-processing of imputations and passive imputation.

All the best,

Gerko and Stef


1. Open R and load the packages mice and dplyr.

library(mice) # Data imputation
library(dplyr) # Data manipulation
library(lattice) # Plotting device
set.seed(123)

We choose seed value 123. This is an arbitrary value; any value would be an equally good seed value. Fixing the random seed enables you (and others) to exactly replicate anything that involves random number generators. If you set the seed in your R instance to 123, you will get the exact same results and plots as we present in this document if you follow the order and code of the exercises precisely.


Passive Imputation

There is often a need for transformed, combined or recoded versions of the data. In the case of incomplete data, one could impute the original, and transform the completed original afterwards, or transform the incomplete original and impute the transformed version. If, however, both the original and the transformed version are needed within the imputation algorithm, neither of these approaches work: One cannot be sure that the transformation holds between the imputed values of the original and transformed versions. mice has a built-in approach, called passive imputation, to deal with situations as described above. The goal of passive imputation is to maintain the consistency among different transformations of the same data. As an example, consider the following deterministic function in the boys data \[\text{BMI} = \frac{\text{Weight (kg)}}{\text{Height}^2 \text{(m)}}\] or the compositional relation in the mammalsleep data: \[\text{ts} = \text{ps}+\text{sws}\]


2. Use passive imputation to impute the deterministic sleep relation in the mammalsleep data. Name the new multiply imputed dataset pas.imp.

First, we create a method vector:

meth <- make.method(mammalsleep)
meth
## species      bw     brw     sws      ps      ts     mls      gt      pi     sei 
##      ""      ""      ""   "pmm"   "pmm"   "pmm"   "pmm"   "pmm"      ""      "" 
##     odi 
##      ""

and a predictorMatrix:

pred <- make.predictorMatrix(mammalsleep)
pred
##         species bw brw sws ps ts mls gt pi sei odi
## species       0  1   1   1  1  1   1  1  1   1   1
## bw            1  0   1   1  1  1   1  1  1   1   1
## brw           1  1   0   1  1  1   1  1  1   1   1
## sws           1  1   1   0  1  1   1  1  1   1   1
## ps            1  1   1   1  0  1   1  1  1   1   1
## ts            1  1   1   1  1  0   1  1  1   1   1
## mls           1  1   1   1  1  1   0  1  1   1   1
## gt            1  1   1   1  1  1   1  0  1   1   1
## pi            1  1   1   1  1  1   1  1  0   1   1
## sei           1  1   1   1  1  1   1  1  1   0   1
## odi           1  1   1   1  1  1   1  1  1   1   0

We add the call for passive imputation to the ts element in the meth object

meth["ts"]<- "~ I(sws + ps)"
meth
##         species              bw             brw             sws              ps 
##              ""              ""              ""           "pmm"           "pmm" 
##              ts             mls              gt              pi             sei 
## "~ I(sws + ps)"           "pmm"           "pmm"              ""              "" 
##             odi 
##              ""

and set the predictor relations for ts with sws and ps to 0. Also, we have to exclude Species as a predictor

pred[c("sws", "ps"), "ts"] <- 0
pred[, "species"] <- 0
pred
##         species bw brw sws ps ts mls gt pi sei odi
## species       0  1   1   1  1  1   1  1  1   1   1
## bw            0  0   1   1  1  1   1  1  1   1   1
## brw           0  1   0   1  1  1   1  1  1   1   1
## sws           0  1   1   0  1  0   1  1  1   1   1
## ps            0  1   1   1  0  0   1  1  1   1   1
## ts            0  1   1   1  1  0   1  1  1   1   1
## mls           0  1   1   1  1  1   0  1  1   1   1
## gt            0  1   1   1  1  1   1  0  1   1   1
## pi            0  1   1   1  1  1   1  1  0   1   1
## sei           0  1   1   1  1  1   1  1  1   0   1
## odi           0  1   1   1  1  1   1  1  1   1   0

This avoids circularity problems where ts would feed back into sws and ps, from which it is calculated:

We can then run the imputations as

pas.imp <- mice(mammalsleep, 
                meth = meth, 
                pred = pred, 
                maxit = 10, 
                seed = 123, 
                print = F)

We used a custom predictor matrix and method vector to tailor our imputation approach to the passive imputation problem. We made sure to exclude ts as a predictor for the imputation of sws and ps to avoid circularity.

We also gave the imputation algorithm 10 iterations to converge and fixed the seed to 123 for this mice instance. This means that even when people do not fix the overall R seed for a session, exact replication of results can be obtained by simply fixing the seed for the random number generator within mice. Naturally, the same input (data) is each time required to yield the same output (mids-object).


3. Inspect the trace lines for pas.imp.

plot(pas.imp)

We can see that the pathological nonconvergence we experienced before has been properly dealt with. The trace lines for the sleep variable look okay now and convergence can be inferred by studying the trace lines.


Post-processing of the imputations

Remember that we imputed the boys data previously with pmm and with norm, where we saw that pmm preserves the observed data range and norm allows us to extend outside of the range of the observed data. If we were to impute e.g. tv with norm the problem arises that there may be negative values among the imputations. Somehow we should be able to

  1. Use a method that extrapolates beyond the range of the observed data, while
  2. Laying a constraint on the imputed values of tv.

The mice() function has an argument called post that takes a vector of strings of R commands. These commands are parsed and evaluated after the univariate imputation function returns, and thus provides a way of post-processing the imputed values while using the processed version in the imputation algorithm. In other words; the post-processing allows us to manipulate the imputations for a particular variable that are generated within each iteration. Such manipulations directly affect the imputated values of that variable and the imputations for other variables. Naturally, such a procedure should be handled with care.


4. Post-process the values to constrain them between 1 and 25, use norm as the imputation method for tv.

meth <- make.method(boys)
meth["tv"] <- "norm"
post <- make.post(boys)
post["tv"] <- "imp[[j]][, i] <- squeeze(imp[[j]][, i], c(1, 25))"
imp <- mice(boys, 
            meth = meth, 
            post = post, 
            print = FALSE)

In this way the imputed values of tv are constrained (squeezed by function squeeze()) between 1 and 25.


5. Compare the imputed values and histograms of tv for the solution obtained by pmm to the constrained solution (created with norm, constrained between 1 and 25).

First, we recreate the default pmm solution

imp.pmm <- mice(boys, print=FALSE)

and we inspect the imputed values for the norm solution

table(complete(imp)$tv)
## 
##                1 1.01856398578212  1.0305265041997 1.05371730709378 
##              274                1                1                1 
## 1.07965881459222 1.17266016197262 1.18331918874243 1.19850340482835 
##                1                1                1                1 
## 1.20751363430595 1.22270349091234 1.24393898381574 1.27416237305213 
##                1                1                1                1 
## 1.37264069156637 1.37687168658768  1.4038468461377  1.4888922915118 
##                1                1                1                1 
##   1.502948063712 1.61094129141498 1.63561475531001 1.69154151039902 
##                1                1                1                1 
## 1.69734085647902 1.75962760046502 1.83703239510597 1.85090932372616 
##                1                1                1                1 
## 1.85196792529081 1.88497590530004 1.93677698925981 1.98714218266051 
##                1                1                1                1 
##                2 2.17006035212886 2.19420629366948 2.20709019333567 
##               26                1                1                1 
## 2.32217300175937 2.33718756021663 2.37925919724656 2.42589950587828 
##                1                1                1                1 
## 2.42852062650941 2.56258286438301 2.57154776173857 2.69125369721839 
##                1                1                1                1 
## 2.81422946755381   2.849182607152 2.85721219486192 2.86802164234102 
##                1                1                1                1 
##  2.9050436581017 2.94455194340074 2.99053395697965                3 
##                1                1                1               19 
## 3.09186572494736 3.09293831549814 3.12759115294094   3.137769304381 
##                1                1                1                1 
## 3.29169176719331 3.34327503822682 3.35169753351213 3.36118943119167 
##                1                1                1                1 
## 3.42894421492264 3.64409914804099  3.7342091243475 3.73449420271883 
##                1                1                1                1 
## 3.83351487625141 3.94898811919176                4 4.18580253805296 
##                1                1               17                1 
## 4.50814821532778  4.5899062419111 4.59879652563211 4.80896223356025 
##                1                1                1                1 
## 4.86693622505292 4.90707701009326 4.92083322380665  4.9919404935006 
##                1                1                1                1 
##                5 5.11550701472057  5.2675575534524 5.30699364355918 
##                5                1                1                1 
## 5.48138217837872  5.9411679485539                6 6.13652148511074 
##                1                1               10                1 
## 6.13894103637924  6.1565094226148 6.25483968121638 6.36617952529365 
##                1                1                1                1 
## 6.40319326867511 6.42478740352842 6.55030483069993 6.67210208568764 
##                1                1                1                1 
## 6.71487285512647 6.83897647017751 6.93446621600148 7.08357417630785 
##                1                1                1                1 
## 7.09178439681213 7.30234534974404 7.68154460182069 7.73437101947725 
##                1                1                1                1 
## 7.80770238610074 7.89989442429735 7.92324157979704                8 
##                1                1                1               13 
## 8.17602314803671 8.19348000001015 8.28049913219575 8.32280884879302 
##                1                1                1                1 
## 8.48903749609463 8.56769013301768 8.83864037628557 8.95992325272691 
##                1                1                1                1 
##                9 9.04477390963831 9.06982897383724 9.16417756212673 
##                1                1                1                1 
## 9.22417249742369 9.33358098899267 9.36992148086693  9.3862952342802 
##                1                1                1                1 
## 9.49848143941098 9.52973463446122 9.64297608115544 9.68522718414533 
##                1                1                1                1 
## 9.72561963180977               10 10.1015146146315 10.1969672444551 
##                1               16                1                1 
## 10.2613845378965 10.3639992156231 10.4322244960718 10.8757137316089 
##                1                1                1                1 
## 11.0985138246362 11.1352224839287 11.3599999270311 11.4990864477122 
##                1                1                1                1 
## 11.8090031513037 11.8742443878141               12 12.0259908745644 
##                1                1               15                1 
## 12.0290193781978 12.1423054413162 12.1509343718361 12.1560181173864 
##                1                1                1                1 
## 12.2007201444469 12.2963796526214 12.4212404510045 12.5885772833468 
##                1                1                1                1 
## 12.5999948122602 12.8650017360119 12.9359546851943               13 
##                1                1                1                1 
## 13.0084110316048  13.412960283899 13.4165985714268 13.5794619759477 
##                1                1                1                1 
## 13.8630567814608 13.8860429760291 13.9683922516302               14 
##                1                1                1                1 
## 14.3621064783652 14.4509003464646 14.6622055659847 14.7174313478183 
##                1                1                1                1 
## 14.8430588927294 14.8648650187386 14.8852811047132 14.9654799894298 
##                1                1                1                1 
## 14.9873182113118               15 15.0142846452428 15.2079680604487 
##                1               27                1                1 
## 15.5365929876549 15.5510774742241 15.7122378270619 15.7512214068479 
##                1                1                1                1 
## 15.7602201466366               16 16.0276057729029 16.1422595831028 
##                1                1                1                1 
## 16.2602970074183 16.3532832122099 16.5894669994605 16.7422615082676 
##                1                1                1                1 
## 16.7678748258145 16.9875879043095               17 17.1056430929331 
##                1                1                1                1 
## 17.1208203940252 17.1692258100321 17.3020554780255  17.390371459762 
##                1                1                1                1 
## 17.5071899885911 17.5632721921984 17.5635369602171 17.7034937492546 
##                1                1                1                1 
## 17.7362651302672 17.7423061432335 17.8199124977756  17.872015418962 
##                1                1                1                1 
## 17.9014749677662 17.9439322565708 17.9822469567094 17.9927913403406 
##                1                1                1                1 
##               18 18.0779873286959 18.1249421079672 18.1295138365517 
##                1                1                1                1 
## 18.2552503507931 18.3353828727134 18.7178960787053 18.7462985774108 
##                1                1                1                1 
## 18.9901829270432 19.1962126352428 19.3746688212472 19.4997160572489 
##                1                1                1                1 
##  19.913785710524 19.9509336126402 19.9611334615165               20 
##                1                1                1               38 
## 20.0135656758663 20.0255142674244 20.0973750863173 20.1340851087617 
##                1                1                1                1 
## 20.1344697493209 20.1745129107608 20.2379726133558 20.3995806508845 
##                1                1                1                1 
## 20.5168922378414 20.5401931227474 20.5513501437093 20.6685882917064 
##                1                1                1                1 
## 20.7232027963561 20.8210450804746 20.8671659772625 21.0244601710077 
##                1                1                1                1 
## 21.0392416860597 21.0537147951292 21.2401873005436 21.2948878442176 
##                1                1                1                1 
## 21.4539529970948 21.4608358275169 21.6328642529837 21.8963277584571 
##                1                1                1                1 
## 21.9752362407994 22.0466545246583 22.3478714189136 22.3830985461179 
##                1                1                1                1 
## 22.6548184951425 22.7147244193324 22.7629694736002 22.8549824442405 
##                1                1                1                1 
## 22.9496558563585 23.6775419376966 23.8326554734164 24.0841386441071 
##                1                1                1                1 
## 24.8989082843077 24.9661702218352               25 
##                1                1               45

and for the pmm solution

table(complete(imp.pmm)$tv)
## 
##   1   2   3   4   5   6   8   9  10  12  13  14  15  16  17  18  20  25 
##  67 236  88  28   6  17  22   1  30  28   4   3  66   1   4   2  72  73

It is clear that the norm solution - as expected - does not give us integer data as imputations. Next, we inspect and compare the density of the incomplete and imputed data for the constrained solution.

densityplot(imp, ~tv)


A nice way of plotting the histograms of both datasets simultaneously is by creating first the dataframe (here we named it tvm) that contains the data in one column and the imputation method in another column.

tv <- c(complete(imp.pmm)$tv, complete(imp)$tv)
used.method <- rep(c("pmm", "norm"), each = nrow(boys))
tvm <- data.frame(tv = tv, method = used.method)

and then plotting a histogram of tv conditional on method.

histogram( ~tv | method, data = tvm, nint = 25)

Is there still a difference in distribution between the two different imputation methods? Which imputations are more plausible do you think?


6. Make a missing data indicator (name it miss) for bmi and check the relation of bmi, wgt and hgt for the boys in the imputed data. To do so, plot the imputed values against their respective calculated values.

miss <- is.na(imp$data$bmi)
xyplot(imp, bmi ~ I (wgt / (hgt / 100)^2),
       na.groups = miss, cex = c(0.8, 1.2), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

With this plot we show that the relation between hgt, wgt and bmi is not preserved in the imputed values. In order to preserve this relation, we should use passive imputation.


7. Use passive imputation to preserve the relation between imputed bmi, wgt and hgt by setting the imputation method for bmi to:

  • meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)"

but do not solve for circularity.

meth <- make.method(boys)
meth["bmi"] <- "~ I(wgt / (hgt / 100)^2)"
imp <- mice(boys, 
            meth = meth, 
            print=FALSE)

8. Again, plot the imputed values of bmi versus the calculated values and check whether there is convergence for bmi.

To inspect the relation:

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex = c(1, 1), pch = c(1, 20),
       ylab = "BMI (kg/m2) Imputed", xlab = "BMI (kg/m2) Calculated")

To study convergence for bmi alone:

plot(imp, c("bmi"))

Although the relation of bmi is preserved now in the imputations we get absurd imputations and on top of that we clearly see there are some problems with the convergence of bmi. The problem is that we have circularity in the imputations. We used passive imputation for bmi but bmi is also automatically used as predictor for wgt and hgt. This can be solved by adjusting the predictor matrix.


9. Solve the problem of circularity (if you did not already do so) and plot once again the imputed values of bmi versus the calculated values.

First, we remove bmi as a predictor for hgt and wgt to remove circularity.

pred <- make.predictorMatrix(boys)
pred[c("hgt", "wgt"), "bmi"] <- 0
pred
##     age hgt wgt bmi hc gen phb tv reg
## age   0   1   1   1  1   1   1  1   1
## hgt   1   0   1   0  1   1   1  1   1
## wgt   1   1   0   0  1   1   1  1   1
## bmi   1   1   1   0  1   1   1  1   1
## hc    1   1   1   1  0   1   1  1   1
## gen   1   1   1   1  1   0   1  1   1
## phb   1   1   1   1  1   1   0  1   1
## tv    1   1   1   1  1   1   1  0   1
## reg   1   1   1   1  1   1   1  1   0

and we run the mice algorithm again with the new predictor matrix (we still ‘borrow’ the imputation methods object meth from before)

imp <-mice(boys, 
           meth = meth, 
           pred = pred, 
           print = FALSE)

Second, we recreate the plots from Assignment 8. We start with the plot to inspect the relations in the observed and imputed data

xyplot(imp, bmi ~ I(wgt / (hgt / 100)^2), na.groups = miss,
       cex=c(1, 1), pch=c(1, 20),
       ylab="BMI (kg/m2) Imputed", xlab="BMI (kg/m2) Calculated")

and continue with the trace plot to study convergence

plot(imp, c("bmi"))

All is well now!


Conclusion

We have seen that the practical execution of multiple imputation and pooling is straightforward with the R package mice. The package is designed to allow you to assess and control the imputations themselves, the convergence of the algorithm and the distributions and multivariate relations of the observed and imputed data.

It is important to ‘gain’ this control as a user. After all, we are imputing values and taking their uncertainty properly into account. Being also uncertain about the process that generated those values is therefor not a valid option.


For fun


What you shouldn’t do with passive imputation

Never set all relations fixed. You will remain with the starting values and waste your computer’s energy (and your own).

meth<- make.method(boys)
pred <- make.predictorMatrix(boys)
meth["bmi"]<- "~ I(wgt / (hgt / 100)^2)"
meth["wgt"]<- "~ I(bmi * (hgt / 100)^2)"
meth["hgt"]<- "~ I(sqrt(wgt / bmi) * 100)"
pred[c("bmi", "wgt", "hgt"), ] <- 0
imp.path <- mice(boys, 
                 meth=meth, 
                 pred=pred, 
                 seed=123)
## 
##  iter imp variable
##   1   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   1   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   2   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   3   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   4   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   1  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   2  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   3  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   4  hgt  wgt  bmi  hc  gen  phb  tv  reg
##   5   5  hgt  wgt  bmi  hc  gen  phb  tv  reg
plot(imp.path, c("hgt", "wgt", "bmi"))

We named the mids object imp.path, because the nonconvergence is pathological in this example!


- End of exercise