High-Performance Open-Source Archive
This vignette accompanies the deprecation of rMIDAS. Existing projects can keep using rMIDAS, but new development should move to rMIDAS2. The source repository for the successor package is https://github.com/MIDASverse/rMIDAS2.
rMIDAS2 is the successor to rMIDAS. It re-implements the MIDAS multiple imputation algorithm with several improvements:
| rMIDAS | rMIDAS2 | |
|---|---|---|
| Backend | TensorFlow (Python, via reticulate) |
PyTorch (Python, via local HTTP API) |
Runtime R dependency on
reticulate |
Yes | No |
| Preprocessing | Manual (convert()) |
Automatic |
| Python versions | 3.6–3.10 | 3.9+ |
| TensorFlow required | Yes (< 2.12) | No |
The API is deliberately simpler: most pipelines that required four function calls in rMIDAS need just one or two in rMIDAS2.
rMIDAS required configuring a
reticulate Python environment with TensorFlow:
# --- rMIDAS ---
library(rMIDAS)
# Python environment configured automatically on first load,
# or manually via set_python_env()rMIDAS2 uses a standalone Python server – no reticulate needed at runtime:
rMIDAS required explicit preprocessing with
convert(), where you had to specify which columns were
binary and which were categorical:
# --- rMIDAS ---
data(adult)
adult_conv <- convert(adult,
bin_cols = c("income"),
cat_cols = c("workclass", "marital_status"),
minmax_scale = TRUE)rMIDAS2 detects column types automatically – just pass your data frame directly:
rMIDAS used train():
# --- rMIDAS ---
mid <- train(adult_conv,
training_epochs = 20L,
layer_structure = c(256, 256, 256),
input_drop = 0.8,
learn_rate = 0.0004,
seed = 89L)rMIDAS2 uses midas_fit() (or the
all-in-one midas()):
# --- rMIDAS2 ---
fit <- midas_fit(adult,
epochs = 20L,
hidden_layers = c(256L, 128L, 64L),
corrupt_rate = 0.8,
lr = 0.001,
seed = 89L)Parameter name changes:
rMIDAS (train()) |
rMIDAS2 (midas_fit()) |
Notes |
|---|---|---|
training_epochs |
epochs |
|
layer_structure |
hidden_layers |
Default changed from 256-256-256 to 256-128-64 |
input_drop |
corrupt_rate |
|
learn_rate |
lr |
Default changed from 0.0004 to 0.001 |
dropout_level |
dropout_prob |
|
train_batch |
batch_size |
Default changed from 16 to 64 |
cont_adj |
num_adj |
|
softmax_adj |
cat_adj |
|
binary_adj |
bin_adj |
rMIDAS used complete():
rMIDAS2 uses midas_transform():
# --- rMIDAS2 ---
imps <- midas_transform(fit, m = 10)
# Returns a list of 10 data.frames
head(imps[[1]])Or skip midas_fit() + midas_transform()
entirely and use the all-in-one midas():
The combine() interface has changed:
rMIDAS took a formula and a list of completed data frames:
rMIDAS2 takes a model ID and an outcome variable name. Independent variables default to all other columns:
# --- rMIDAS2 ---
combine(fit, y = "income")
# Specify predictors explicitly:
combine(fit, y = "income", ind_vars = c("age", "hours_per_week"))The output format is the same: a data frame with columns
term, estimate, std.error,
statistic, df, and p.value.
rMIDAS required re-specifying the data and column types:
# --- rMIDAS ---
overimpute(adult,
binary_columns = c("income"),
softmax_columns = c("workclass", "marital_status"),
training_epochs = 20L,
spikein = 0.3)rMIDAS2 runs overimputation on an already-fitted model:
rMIDAS2 adds imp_mean(), which computes the element-wise
mean across all imputations – useful as a quick single point
estimate:
Below is a full rMIDAS pipeline and its rMIDAS2 equivalent.
library(rMIDAS)
data(adult)
adult <- adult[1:1000, ]
# 1. Preprocess
adult_conv <- convert(adult,
bin_cols = c("income"),
cat_cols = c("workclass", "marital_status"),
minmax_scale = TRUE)
# 2. Train
mid <- train(adult_conv, training_epochs = 20L, seed = 89L)
# 3. Generate imputations
imps <- complete(mid, m = 5)
# 4. Analyse
combine("income ~ age + hours_per_week", imps)| Task | rMIDAS | rMIDAS2 |
|---|---|---|
| Install Python env | Automatic / set_python_env() |
install_backend() |
| Preprocess data | convert(data, bin_cols, cat_cols) |
Not needed |
| Train model | train(data, training_epochs, ...) |
midas_fit(data, epochs, ...) |
| Generate imputations | complete(model, m) |
midas_transform(model, m) |
| Train + impute (one step) | Not available | midas(data, m, epochs, ...) |
| Mean imputation | Not available | imp_mean(model) |
| Rubin’s rules | combine(formula, df_list) |
combine(model, y, ind_vars) |
| Overimputation | overimpute(data, ...) |
overimpute(model, mask_frac) |
| Shutdown | Not needed | stop_server() |
Need mirroring services?
Contact our team at info@vpspulse.com.
Mirror powered by VPSpulse
Infrastructure sponsored by VPSPulse & Secure Payments by ArionPay.