Help for package ebrahim.gof

Type:

Package

Title:

Goodness-of-Fit and Calibration Tests for Logistic Regression

Version:

2.4.0

Date:

2026-07-22

Maintainer:

Ebrahim Khaled Ebrahim <ebrahimkhaled@alexu.edu.eg>

Description:

Provides a unified battery of goodness-of-fit and calibration tests for binary logistic regression, runnable in a single call via 'run.all.gof()'. The package introduces the author's own tests aimed at sparse data — the omnibus Ebrahim-Farrington (EF) test, the Directed EF ('EDGE') test that targets smooth calibration-shape departures, and a Cauchy-combination ensemble — and aggregates a wide range of classical and modern tests for comparison, including Hosmer-Lemeshow, McCullagh, Osius-Rojek, le Cessie-van Houwelingen, Stute-Zhu, the binary-adaptive 'BAGofT' test, and the 'GiViTI' calibration test (each obtained from its own package, where installed, and attributed to its authors). The tools are particularly suited to sparse data, where the Hosmer-Lemeshow test loses power. For more details see Hosmer (1980) <doi:10.1080/03610928008827941> and Farrington (1996) <doi:10.1111/j.2517-6161.1996.tb02086.x>.

License:

GPL-3

URL:

https://github.com/ebrahimkhaled/ebrahim.gof

BugReports:

https://github.com/ebrahimkhaled/ebrahim.gof/issues

Depends:

R (≥ 3.5.0)

Imports:

parallel, stats

Suggests:

testthat (≥ 3.0.0), knitr, rmarkdown, ResourceSelection, ggplot2, CompQuadForm, statmod, mgcv, BAGofT, givitiR, callr

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.2

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2026-07-22 14:01:07 UTC; ebrah

Author:

Ebrahim Khaled Ebrahim

[aut, cre]

Repository:

CRAN

Date/Publication:

2026-07-22 16:00:02 UTC

ebrahim.gof: Goodness-of-Fit and Calibration Tests for Logistic Regression

Description

A unified toolbox of goodness-of-fit and calibration tests for binary logistic regression, callable in a single line via run.all.gof. The package is aimed particularly at sparse data, where the classical Hosmer–Lemeshow test loses power.

The author's own tests

ef.gof — the omnibus Ebrahim–Farrington (EF) test for binary data with automatic grouping.
def.gof / edge.gof — the Directed EF (“EDGE”) test, which spends its few degrees of freedom on the smooth calibration-shape directions where structured misfit concentrates.
def.ensemble.gof — a Cauchy-combination ensemble of the directed bases.

Aggregated tests (for comparison)

run.all.gof also runs, in one call, a wide range of classical and modern tests — Hosmer–Lemeshow, McCullagh, Osius–Rojek, le Cessie–van Houwelingen, Stute–Zhu, the binary-adaptive BAGofT test, and the givitiR calibration test. Each aggregated test is obtained from its own package (where installed) and is attributed to its authors; these are provided for head-to-head comparison, not claimed as original to this package.

Data

gof_demo is a bundled example dataset with a documented, reproducible misfit for illustrating the battery.

Author(s)

Maintainer: Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg (ORCID)

Covariate-Space Directed Ebrahim-Farrington (CDEF) Goodness-of-Fit Test

Description

A directed goodness-of-fit test for binary logistic regression whose direction lives in covariate space (functions of the predictors) rather than in fitted-probability space like def.gof. It projects the standardized residuals onto a covariate-space basis (polynomials and pairwise products, natural splines, or a combination that also includes fitted-probability bends) and calibrates the quadratic form with the Farrington estimation-adjusted projection, exactly as in def.gof. This makes it sensitive to omitted interactions and to local / oscillatory departures that fitted-probability grouping can miss.

Usage

cdef.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  basis = c("poly", "spline", "combined"),
  method = c("satterthwaite", "imhof")
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs and X).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Design/covariate matrix (with or without an intercept column); required when object is a y vector. Ignored when object is a glm.

basis

One of "poly" (squares, cubes, pairwise products), "spline" (natural cubic splines per covariate plus a pairwise term; needs splines), or "combined" (covariate polynomials plus fitted-probability bends).

method

One of "satterthwaite" (default) or "imhof".

Details

Let \tilde r_i=(y_i-\hat p_i)/\sqrt{\hat p_i(1-\hat p_i)} be the standardized residuals and Z a covariate-space basis matrix. The statistic is S=(Z'\tilde r)'(Z'Z)^{-1}(Z'\tilde r), whose null distribution is a weighted sum of \chi^2_1 variables with weights the eigenvalues of (Z'Z)^{-1}Z'\Omega Z, where \Omega=I-V^{1/2}X(X'VX)^{-1}X'V^{1/2} adjusts for estimating \hat\beta. The p-value uses a Satterthwaite scaled-\chi^2 approximation (default) or Imhof's method (CompQuadForm). Rank-deficient bases are reduced automatically.

Value

A one-row data.frame with Test, Basis, Test_Statistic, df, Method, and p_value.

References

Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. JRSS-B 58(2), 349-360.

Examples

set.seed(1)
n <- 600; x1 <- runif(n, -3, 3); x2 <- rnorm(n)
# truth has an omitted interaction; fit the additive model
y <- rbinom(n, 1, plogis(0.3 + 0.8 * x1 - 0.5 * x2 + 0.4 * x1 * x2))
fit <- glm(y ~ x1 + x2, family = binomial())
cdef.gof(fit)                    # covariate-space directed test (poly basis)
cdef.gof(fit, basis = "spline")  # for local / oscillatory misfit

Combine Directed GOF Tests into One Decision (Ensemble)

Description

Combines the three Directed Ebrahim-Farrington (DEF) basis tests ("poly2", "poly3", "stukel") into a single goodness-of-fit decision, so the user does not have to choose a basis. By default the p-values are combined with the Cauchy Combination Test (CCT), which controls the error rate under the strong dependence between tests computed on the same fitted model. The omnibus EF test can optionally be added to the vote.

Usage

def.ensemble.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  components = c("poly2", "poly3", "stukel"),
  add_ef = FALSE,
  combine = c("cct", "minp", "fisher"),
  G = 10,
  extra_pvalues = NULL
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Optional design matrix, threaded to def.gof for the exact calibration (only used with the y/predicted_probs form).

components

Character vector, a subset of c("poly2","poly3","stukel"). Default is all three.

add_ef

Logical; if TRUE, the omnibus EF p-value (ef.gof) is appended to the components. Default FALSE.

combine

One of "cct" (default), "minp", "fisher".

G

Integer number of groups passed to def.gof/ef.gof (default 10).

extra_pvalues

Optional named numeric vector of additional p-values to include (e.g. a Tsiatis test computed elsewhere). Default NULL.

Details

Because the component tests are computed on the same fit, their p-values are strongly dependent. The CCT (combine = "cct") has an asymptotic standard-Cauchy null whose tail is robust to this dependence, so it needs no calibration. The "minp" (Sidak) and "fisher" rules assume independence and are offered for comparison only; under positive dependence "minp" is conservative and "fisher" is anti-conservative, so they should be calibrated by simulation before use (not done here).

Value

A one-row data.frame with columns Test, Combiner, Components, k, and p_value.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Liu, Y. and Xie, J. (2020). Cauchy combination test. JASA, 115(529), 393-402.

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())
def.ensemble.gof(fit)                 # CCT of the three DEF bases
def.ensemble.gof(fit, add_ef = TRUE)  # add the omnibus EF

Directed Ebrahim-Farrington (DEF) Goodness-of-Fit Test

Description

Performs the Directed Ebrahim-Farrington (DEF) goodness-of-fit test for a fitted binary logistic regression model. DEF concentrates its power on a small set of calibration-curve "shape" directions by projecting the grouped standardized residuals onto a low-dimensional basis and testing the squared length of that projection.

Naming note: this test is published under the name EDGE (Ebrahim Directed Goodness-of-fit Evaluation), and edge.gof is the primary interface going forward. def.gof() is retained, unchanged, as a fully supported legacy name.

Usage

def.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  G = 10,
  basis = c("poly3", "poly2", "stukel", "ensemble"),
  method = c("satterthwaite", "imhof")
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector, ignored when it is a glm.

X

Optional design matrix, used only with the y/predicted_probs form: it enables the exact estimation-adjusted (\Omega) calibration (logit working weights assumed). Without it the conservative \chi^2_k reference is used and a warning is issued. Ignored when object is a glm.

G

Integer number of equal-frequency groups (default 10; must be >= 3).

basis

One of "poly3" (default), "poly2", "stukel", or "ensemble".

method

One of "satterthwaite" (default) or "imhof".

Details

The observations are sorted by predicted probability and split into G equal-frequency groups; the standardized grouped residual vector r is projected onto a basis matrix Z of smooth shapes, giving S = (Z'r)'(Z'Z)^{-1}(Z'r). Its null distribution is a weighted sum of \chi^2_1 variables with weights equal to the eigenvalues of (Z'Z)^{-1}Z'\Omega Z, where \Omega = I - U(X'WX)^{-1}U' is the estimation-adjusted covariance of the grouped residuals. The p-value uses a Satterthwaite scaled-\chi^2 approximation (default) or Imhof's method (if the CompQuadForm package is installed). Bases: "poly2", "poly3" (default), "stukel"; "ensemble" runs all three and combines them via def.ensemble.gof.

Value

A one-row data.frame with columns Test, Basis, Test_Statistic (the statistic S), df, Method, and p_value. When basis = "ensemble", the return is that of def.ensemble.gof.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Ebrahim, K. E. and El-Kotory, A. Omnibus versus Directed Goodness-of-Fit Tests for Sparse Data in Binary Logistic Regression (companion paper).

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())
def.gof(fit)                       # default poly3 basis
def.gof(fit, basis = "stukel")     # tail-shape basis
def.gof(fit, basis = "ensemble")   # combine all three (CCT)

Deployable learned-ensemble GOF test via parametric bootstrap

Description

Turns a pre-trained ensemble meta into a deployable goodness-of-fit test for any fitted model: it scores the model, then calibrates the p-value by a per-dataset parametric bootstrap from the fitted model (so no knowledge of the truth or the data-generating design is required). Validity comes from the bootstrap, independent of how meta was trained.

Usage

deploy.gof(object, meta, B = 99, feature_fn = gof.features)

Arguments

object

A fitted binary logistic glm.

meta

A pre-trained scorer: either a function f(features) returning a scalar misfit score, or an object with a predict method consuming a one-row feature matrix.

B

Number of parametric-bootstrap resamples (default 99).

feature_fn

Function mapping a fitted glm to its feature vector (default gof.features).

Value

A one-row data.frame with the score, B, and the bootstrap p_value.

EDGE: Directed Goodness-of-Fit Test for Binary Logistic Regression

Description

edge.gof() is the primary interface to the EDGE test (Ebrahim Directed Goodness-of-fit Evaluation): a grouped, directed goodness-of-fit test for binary logistic regression under sparse data. EDGE projects the grouped standardized residuals onto a small pre-specified basis of calibration shapes (cubic "poly3" by default) and refers the resulting quadratic form to its closed-form weighted chi-squared null distribution – no refit, no resampling, no tuning.

edge.gof() computes exactly the same statistic as the legacy name def.gof (retained for backward compatibility); the returned Test label is "EDGE".

Usage

edge.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  G = 10,
  basis = "poly3",
  method = "satterthwaite"
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector, ignored when it is a glm.

X

G

Integer number of equal-frequency groups (default 10; must be >= 3).

basis

One of "poly3" (default), "poly2", "stukel", or "ensemble".

method

One of "satterthwaite" (default) or "imhof".

Value

A one-row data.frame with columns Test ("EDGE"), Basis, Test_Statistic, df, Method, and p_value, as documented in def.gof.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Ebrahim, E. K. and El-Kotory, A. (2026). EDGE: a closed-form directed goodness-of-fit test for sparse logistic regression. Manuscript.

Ebrahim, E. K. and El-Kotory, A. (2026). A directional Hosmer-Lemeshow goodness-of-fit test for sparse logistic regression. arXiv:2607.15454.

Examples

set.seed(1)
x <- runif(500, -3, 3)
y <- rbinom(500, 1, plogis(0.6 * x))
fit <- glm(y ~ x, family = binomial())
edge.gof(fit)                      # default cubic basis, G = 10
edge.gof(fit, basis = "stukel")    # Stukel-shape basis

EDGES: Cauchy-Combination Ensemble of Directed GOF Tests (Alias)

Description

Alias for def.ensemble.gof(); see the EDGES paper. edges.gof() is the brand name (EDGES = the Cauchy-combination ensemble of the EDGE directed bases) used in the manuscript. It takes exactly the same arguments as def.ensemble.gof and returns exactly the same value; the legacy name def.ensemble.gof() is retained unchanged for back-compatibility.

Usage

edges.gof(...)

Arguments

...

Arguments passed on to def.ensemble.gof (e.g. object, predicted_probs, X, components, add_ef, combine, G, extra_pvalues).

Value

A one-row data.frame with columns Test, Combiner, Components, k, and p_value.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Liu, Y. and Xie, J. (2020). Cauchy combination test. JASA, 115(529), 393-402.

Examples

set.seed(1)
x <- runif(500, -3, 3)
y <- rbinom(500, 1, plogis(0.6 * x))
fit <- glm(y ~ x, family = binomial())
edges.gof(fit)                 # identical to def.ensemble.gof(fit)

Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression

Description

Performs the Ebrahim-Farrington goodness-of-fit test for logistic regression models. This test is particularly effective for binary data and sparse datasets, providing an improved alternative to the traditional Hosmer-Lemeshow test.

Usage

ef.gof(
  y,
  predicted_probs = NULL,
  model = NULL,
  m = NULL,
  G = 10,
  method = c("chisq", "normal")
)

Arguments

y

A fitted binary logistic glm (then predicted_probs is taken from it automatically), or a numeric vector of binary responses (0/1) for binary data / counts of successes for grouped data.

predicted_probs

Numeric vector of predicted probabilities from the logistic regression model. Must be same length as y.

model

Optional glm object. Required only for the original Farrington test with grouped data (when m is provided and G is NULL).

m

Optional numeric vector of trial counts for each observation (for grouped data). If NULL, data is assumed to be binary.

G

Optional integer specifying the number of groups for binary data grouping. Default is 10. If NULL, no grouping is performed and m must be provided.

method

Reference distribution for the grouped EF statistic: "chisq" (default) refers T_{EF} to a \chi^2_{G-2} distribution; "normal" uses the standardized Z_{EF} (the behaviour of package versions <= 1.0.0).

Details

The Ebrahim-Farrington test is based on Farrington's (1996) theoretical framework but simplified for practical implementation with binary data. The test uses a modified Pearson chi-square statistic with data-dependent grouping, where observations are grouped by their predicted probabilities.

For binary data (when G is specified), the test automatically groups observations into G groups based on predicted probabilities and applies the simplified Ebrahim-Farrington statistic:

Z_{EF} = \frac{T_{EF} - (G - 2)}{\sqrt{2(G-2)}}

where T_{EF} is the modified Pearson chi-square statistic, and G is the number of groups.

For grouped data (when m is provided), the test applies the original Farrington test with full variance calculations.

Value

A data frame with the following columns:

Test

Character string identifying the test performed

Test_Statistic

Numeric value of the standardized test statistic

p_value

Numeric p-value for the test

Note

For binary data with automatic grouping (G specified): Use the Ebrahim-Farrington test which is computationally efficient and doesn't require the model specification.
For grouped data (m provided): Use the original Farrington test which requires the fitted model object.
The test statistic follows a standard normal distribution under the null hypothesis of adequate model fit.
For binary data with m=1 for all observations and no grouping, the test is not applicable and will return a p-value of 1.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

Farrington, C. P. (1996). On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data. *Journal of the Royal Statistical Society. Series B (Methodological)*, 58(2), 349-360. Ebrahim, K. E. (2025). Goodness-of-Fits Tests and Calibration Machine Learning Algorithms for Logistic Regression Model with Sparse Data. *Master's Thesis*, Alexandria University. Hosmer, D. W., & Lemeshow, S. (1980). A goodness-of-fit test for the multiple logistic regression model. *Communications in Statistics - Theory and Methods*, 9(10), 1043–1069. https://doi.org/10.1080/03610928008827941

Examples

# Example 1: Binary data with automatic grouping (Ebrahim-Farrington test)
set.seed(123)
n <- 500
x <- rnorm(n)
linpred <- 0.5 + 1.2 * x
prob <- 1 / (1 + exp(-linpred))
y <- rbinom(n, 1, prob)

# Fit logistic regression
model <- glm(y ~ x, family = binomial())
predicted_probs <- fitted(model)

# Perform Ebrahim-Farrington test with 10 groups
result <- ef.gof(y, predicted_probs, G = 10)
print(result)

# Example 2: Compare with different number of groups
result_4 <- ef.gof(y, predicted_probs, G = 4)
result_20 <- ef.gof(y, predicted_probs, G = 20)

# Example 3: Grouped data (original Farrington test)
# Note: This requires actual grouped data with trials > 1
## Not run: 
# Simulated grouped data
n_groups <- 50
m_trials <- sample(5:20, n_groups, replace = TRUE)
x_grouped <- rnorm(n_groups)
linpred_grouped <- -0.5 + 1.0 * x_grouped
prob_grouped <- 1 / (1 + exp(-linpred_grouped))
y_grouped <- rbinom(n_groups, m_trials, prob_grouped)

# Fit model for grouped data
data_grouped <- data.frame(successes = y_grouped, trials = m_trials, x = x_grouped)
model_grouped <- glm(cbind(successes, trials - successes) ~ x, 
                     data = data_grouped, family = binomial())
predicted_probs_grouped <- fitted(model_grouped)

# Original Farrington test
result_grouped <- ef.gof(y_grouped, predicted_probs_grouped, 
                         model = model_grouped, m = m_trials)
print(result_grouped)

## End(Not run)

Goodness-of-fit evidence features for a fitted model

Description

Builds the evidence vector used by the learned-ensemble goodness-of-fit test: one-sided z-scores \Phi^{-1}(1-p) from a panel of GOF tests plus the covariate-space directed tests. Larger values mean stronger evidence of misfit.

Usage

gof.features(
  object,
  tests = c("HL", "HL-equalwidth", "Pigeon-Heyse", "Tsiatis", "Xie", "EF", "DEF.poly2",
    "DEF.poly3", "DEF.stukel")
)

Arguments

object

A fitted binary logistic glm.

tests

Character vector of run.all.gof test names to use as panel features (default: a fast partition + DEF-family panel).

Value

A named numeric vector of evidence features.

Synthetic binary outcome data with a smooth calibration misfit

Description

A small, fully synthetic dataset for demonstrating the goodness-of-fit and calibration battery. It was generated reproducibly (see data-raw/make_gof_demo.R) from a logistic data-generating process whose true linear predictor includes a quadratic term in (standardized) age. A model that regresses outcome on age linearly (together with bmi, sex and treatment) is therefore mildly misspecified, through a smooth, low-dimensional calibration distortion. This is the regime in which the directed Ebrahim–Farrington / EDGE test (edge.gof, def.gof) is designed to have more power than classical omnibus tests such as Hosmer–Lemeshow.

Usage

gof_demo

Format

A data frame with 800 rows and 5 variables:

outcome: binary response, 0/1 (event rate about 0.27).
age: continuous covariate, years (range about 20–70). The true model depends on age quadratically.
bmi: continuous covariate, body mass index in kg/m^2.
sex: binary covariate, 0 = female, 1 = male.
treatment: binary covariate, 0 = control, 1 = treated.

Details

The true data-generating linear predictor is

\eta = -0.6 + 0.8 z_a - 0.7 z_a^2 + 0.5 z_b + 0.4\,\mathrm{sex} - 0.3\,\mathrm{treatment},

where z_a = (\mathrm{age} - 45)/14 and z_b = (\mathrm{bmi} - 27)/4, and \Pr(\mathrm{outcome} = 1) = \mathrm{plogis}(\eta).

Source

Simulated; see data-raw/make_gof_demo.R in the package sources.

Examples

data("gof_demo", package = "ebrahim.gof")
fit <- glm(outcome ~ age + bmi + sex + treatment,
           data = gof_demo, family = binomial)
edge.gof(fit)

Grouped-covariate companion to `gof_demo` (replicated covariate patterns)

Description

A companion dataset to gof_demo built from the same data-generating process and seed discipline (see data-raw/make_gof_demo.R), except that the covariates are coarsened before the linear predictor is computed: age is rounded to 10-year bins (20, 30, ..., 70) and bmi to whole integers. The recorded covariates are therefore exactly the covariates the outcome was generated from, and many observations share a covariate pattern (328 distinct patterns among 800 observations, versus one pattern per observation in gof_demo).

Usage

gof_demo_grouped

Format

A data frame with 800 rows and 5 variables:

outcome: binary response, 0/1 (event rate about 0.27).
age: age in years, rounded to 10-year bins (20, 30, ..., 70). The true model depends on (binned) age quadratically.
bmi: body mass index in kg/m^2, rounded to whole integers.
sex: binary covariate, 0 = female, 1 = male.
treatment: binary covariate, 0 = control, 1 = treated.

Details

Its purpose is to demonstrate the sparse-versus-grouped distinction that run.all.gof surfaces: the battery reports the per-observation ("sparse") and per-covariate-pattern ("grouped") forms of the pattern-sensitive tests (Pearson, deviance, McCullagh) side by side, and on replicated-pattern data such as this the two forms can disagree on the same fitted model. On the all-continuous gof_demo (every observation its own pattern) the two forms coincide – the degenerate case.

The true data-generating linear predictor has the same form as for gof_demo,

\eta = -0.6 + 0.8 z_a - 0.7 z_a^2 + 0.5 z_b + 0.4\,\mathrm{sex} - 0.3\,\mathrm{treatment},

with z_a = (\mathrm{age} - 45)/14 and z_b = (\mathrm{bmi} - 27)/4 computed from the binned age and bmi, and \Pr(\mathrm{outcome} = 1) = \mathrm{plogis}(\eta).

Source

Simulated; see data-raw/make_gof_demo.R in the package sources.

Examples

data("gof_demo_grouped", package = "ebrahim.gof")
fit <- glm(outcome ~ age + bmi + sex + treatment,
           data = gof_demo_grouped, family = binomial)
# sparse and grouped forms reported side by side:
run.all.gof(fit, include_slow = FALSE, install = "no")

Install the optional packages used by `run.all.gof()`

Description

The slow tests in run.all.gof rely on optional packages that live in Suggests (givitiR and callr for the GiViTI calibration test, mgcv for the GAM tests, BAGofT for the adaptive test, and ResourceSelection for the Lai-Liu test). Per CRAN policy the package never installs them on its own; this helper installs the missing ones for you, asking first.

Usage

gof_install_suggests(pkgs = NULL, ask = interactive(), update = FALSE)

Arguments

pkgs

Optional character vector of package names to consider. Defaults to the full optional set used by the battery.

ask

Logical; when TRUE (the default in interactive sessions) you are shown the list of packages to be installed/updated and asked to confirm first. Set ask = FALSE to proceed without a prompt (e.g. in a setup script you control).

update

Logical; when FALSE (the default) only the missing packages are installed and anything already present is left untouched. When TRUE the function also checks (via old.packages) which of the present packages are out of date and offers to update those too. The update check contacts your CRAN mirror, so it is a little slower.

Value

Invisibly, the character vector of packages that were installed or updated (empty if nothing was needed).

Examples

## Not run: 
# install whatever optional packages are missing, after confirming:
gof_install_suggests()

# also update any that are out of date:
gof_install_suggests(update = TRUE)

# just the GiViTI dependencies, no prompt:
gof_install_suggests(c("givitiR", "callr"), ask = FALSE)

## End(Not run)

Plot the GiViTI calibration belt from a goodness-of-fit battery

Description

Draws the GiViTI calibration belt stored on a run.all.gof result that was produced with calibration_plot = TRUE. The belt shows the fitted calibration curve with a confidence region against the 45-degree line.

Usage

## S3 method for class 'gof_battery'
plot(x, ...)

Arguments

x

A gof_battery object from run.all.gof.

...

Passed to the givitiR plot method.

Value

x, invisibly.

Print a goodness-of-fit battery

Description

Formats the run.all.gof result as a compact, readable table: rows grouped by test family, p-values shown to four decimals (or scientific for very small values, "-" when not available), and a significance flag. The object is still a plain data.frame underneath, so all the raw columns remain available for programmatic use.

Usage

## S3 method for class 'gof_battery'
print(x, ...)

Arguments

x

A gof_battery object returned by run.all.gof.

...

Ignored.

Value

x, invisibly.

Run a Battery of Goodness-of-Fit Tests at Once

Description

Runs several goodness-of-fit tests for a binary logistic regression in one call and returns one tidy data.frame, one row per test. Pass a fitted glm to run the whole battery; pass (y, predicted_probs) to run the tests that need only predictions. Each test is wrapped so that a failure of one test never aborts the whole run.

Usage

run.all.gof(
  object,
  predicted_probs = NULL,
  X = NULL,
  tests = "all",
  G = 10,
  include_slow = TRUE,
  parallel = FALSE,
  ncores = NULL,
  calibration_plot = FALSE,
  install = c("ask", "no", "yes"),
  control = list()
)

Arguments

object

A fitted binary logistic glm, or a binary (0/1) response vector y (then supply predicted_probs).

predicted_probs

Numeric predicted probabilities; required when object is a y vector.

X

Optional design matrix; lets the directed (DEF) tests run from the (y, predicted_probs) form.

tests

Either "all" (default) or a character vector of test names to run (e.g. c("EF","DEF.poly3","HL")).

G

Integer number of groups passed to the grouping tests (default 10).

include_slow

Logical; when TRUE (the default) the full battery runs, including the slow tests: le Cessie-van Houwelingen smoothing (O(n^2)-O(n^3)), the GAM tests, Stute-Zhu, eHL, BAGofT, and GiViTI. Set FALSE for a quick run with the fast tests only. A one-time message notes this whenever slow tests are included.

parallel

Logical; when TRUE, the resampling loops of the slow bootstrap tests (Stute-Zhu and Lai-Liu-HL) are run on a local PSOCK cluster via parLapply (works on all platforms, including Windows). All other tests are unaffected. The default FALSE keeps every loop sequential, exactly as in previous versions.

ncores

Integer; the number of worker processes used when parallel = TRUE. The default NULL uses max(1, parallel::detectCores() - 1). Values below 2 fall back to the sequential path.

calibration_plot

Logical; when TRUE and GiViTI is among the tests, also compute and draw the GiViTI calibration belt and store it on the result (retrievable with plot()). Default FALSE.

install

One of "ask" (default), "no", or "yes", controlling what happens when a test in the run needs an optional package that is not installed. In an interactive session, "ask" lists the missing packages and asks before installing, and "yes" installs them without asking; "no" never installs (the test is just skipped with a note). In a non-interactive session (scripts, R CMD check) nothing is ever installed, regardless of this setting. See gof_install_suggests.

control

Optional named list of per-test options. Recognized entries: "Stute-Zhu" = list(B = ...) (bootstrap replicates); GiViTI = list(devel = "internal"/"external"); "Lai-Liu-HL" = list(n0 = ..., k = ..., alpha = ...); and BAGofT = list(...) which forwards to the binary adaptive test – nsim (resampling iterations; default 100), nsplits, ne (the estimation-split size), and the random-forest partitioner's tuning Kmax (maximum number of adaptive partition cells), ntree, nmin, mtry, maxnodes. Example: list(BAGofT = list(nsim = 200, Kmax = 8, ntree = 500)).

Details

The currently bundled tests are: Pearson, Deviance, Osius-Rojek, McCullagh, Copas-RSS, and Information-Matrix (the White/Orme test) (global / standardized); McCullagh standardizes the Pearson statistic by its exact conditional moments (Kuss 2002 algorithm); HL (Hosmer-Lemeshow deciles), HL-equalwidth, Pigeon-Heyse, and F-test (the modified Hosmer-Lemeshow F-test: deviance residuals ANOVA-F-tested across deciles) (partition); EF and EF-normal (the omnibus Ebrahim-Farrington test with the chi-square and normal references; the normal form reproduces the thesis simulation); DEF.poly2/poly3/stukel and Stukel (directed); Tsiatis, Xie, and Pulkstenis-Robinson (covariate-space); the two ensemble rows (Ensemble.Vote(3DEF) and Ensemble.Univ(3DEF+EF)) from the Cauchy combination test; and, when include_slow = TRUE, the opt-in slow tests: le-Cessie-van Houwelingen smoothing, the GAM-based tests HL-GAM, PR-GAM, Xie-GAM (need mgcv; fit an overfit GAM for grouping), Stute-Zhu (a cumulative-residual parametric-bootstrap test; set the number of reps with control = list("Stute-Zhu" = list(B = ...))), eHL (the e-value Hosmer-Lemeshow test, reported as p = min(1, 1/e)), and BAGofT (the binary-adaptive GOF test; needs the BAGofT package, control = list(BAGofT = list(nsim = ...))), and Lai-Liu-HL (Lai & Liu's standardized-power procedure for the Hosmer-Lemeshow test, which has no p-value: it reports the standardized power as the statistic and a randomized accept/reject decision in the Note; target size via control = list("Lai-Liu-HL" = list(n0 = ..., k = ...))), and GiViTI and GiViTI-external (the GiViTI polynomial calibration test with the internal and external development assumptions; wraps givitiR, run in an isolated callr subprocess so a failure in its compiled dependencies returns NA rather than aborting the session; set control = list(GiViTI = list(devel = "internal"))).

Notes: Tsiatis and Xie cluster the covariate space with k-means (a fixed internal seed, so results are reproducible and the caller's RNG is left untouched). Xie uses the corrected degrees of freedom G - k/2 - 1 with k the number of predictors. Pulkstenis-Robinson auto-detects the categorical covariate (any factor/character/logical, or a numeric with at most getOption("ebrahim.gof.pr.maxlev", 6) distinct values); it returns NA with a note when none is present.

Every bundled test reproduces the implementation used in the original thesis simulation: Osius-Rojek and Stukel follow LogisticDx's gof.glm (Stukel via statmod::glm.scoretest when statmod is installed), Copas-RSS follows rms's gof residual, HL follows ResourceSelection::hoslem.test, and the others match their standalone reference functions; all were checked to agree numerically.

Value

A data.frame (of class gof_battery) with columns Test, Family, Statistic, df, p_value, and Note, one row per test. A dedicated print method shows the rows grouped by family with formatted p-values and significance flags; the underlying columns remain available for programmatic use.

Note

Grouped vs sparse forms. Pearson, Deviance and McCullagh are reported in two forms: the default (sparse / one-trial) form and a "(grouped)" form computed on the distinct covariate patterns (each a Binomial(m_g, P_g)). The two are identical when every covariate pattern is unique (fully sparse data, as in the simulation) and differ only when patterns repeat (m_g > 1). To avoid clutter, the "(grouped)" row is shown only when it actually differs from the sparse form (i.e., when some pattern repeats); on fully sparse data it is a duplicate and is omitted. Osius-Rojek is always computed on covariate patterns, matching its classical (LogisticDx) definition.

Farrington vs EF. The original Farrington (1996) test is a grouped (covariate-pattern) test. The Ebrahim-Farrington (EF) test is its sparse-data counterpart: it does not group by covariate pattern but forms G data-dependent bins of the predicted risk, so it applies directly to fully sparse data. Use EF for sparse binary data; the grouped Farrington form is appropriate only when covariate patterns repeat.

Reproducibility of the parallel path. With parallel = TRUE the cluster's random-number streams are initialized with clusterSetRNGStream, seeded deterministically from the session's current RNG state. Two runs from the same set.seed state (and the same ncores) therefore give identical bootstrap p-values. Note that the parallel L'Ecuyer-CMRG streams necessarily differ from the serial RNG stream, so parallel = TRUE results differ (within Monte-Carlo error) from parallel = FALSE results at the same seed; this is standard and both are valid. Results also depend on ncores, because the replicates are split across workers.

Author(s)

Ebrahim Khaled Ebrahim ebrahimkhaled@alexu.edu.eg

References

The aggregated tests are due to their original authors; they are provided here for comparison and credited as follows.

Farrington CP (1996). "On Assessing Goodness of Fit of Generalized Linear Models to Sparse Data." Journal of the Royal Statistical Society B, 58(2), 349–360. doi:10.1111/j.2517-6161.1996.tb02086.x

Hosmer DW, Lemeshow S (1980). "Goodness of Fit Tests for the Multiple Logistic Regression Model." Communications in Statistics – Theory and Methods, 9(10), 1043–1069. doi:10.1080/03610928008827941

McCullagh P (1985). "On the Asymptotic Distribution of Pearson's Statistic in Linear Exponential Family Models." International Statistical Review, 53(1), 61–67. doi:10.2307/1402880

Osius G, Rojek D (1992). "Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom." Journal of the American Statistical Association, 87(420), 1145–1152. doi:10.1080/01621459.1992.10476271

le Cessie S, van Houwelingen JC (1991). "A Goodness-of-Fit Test for Binary Regression Models, Based on Smoothing Methods." Biometrics, 47(4), 1267–1282. doi:10.2307/2532385

Stukel TA (1988). "Generalized Logistic Models." Journal of the American Statistical Association, 83(402), 426–431. doi:10.1080/01621459.1988.10478613

Stute W, Zhu LX (2002). "Model Checks for Generalized Linear Models." Scandinavian Journal of Statistics, 29(3), 535–545. doi:10.1111/1467-9469.00304

Tsiatis AA (1980). "A Note on a Goodness-of-Fit Test for the Logistic Regression Model." Biometrika, 67(1), 250–251. doi:10.1093/biomet/67.1.250

Xie XJ, Pendergast J, Clarke W (2008). "Increasing the Power: A Practical Approach to Goodness-of-Fit Test for Logistic Regression Models with Continuous Predictors." Computational Statistics & Data Analysis, 52(5), 2703–2713. doi:10.1016/j.csda.2007.09.027

Pulkstenis E, Robinson TJ (2002). "Two Goodness-of-Fit Tests for Logistic Regression Models with Continuous Covariates." Statistics in Medicine, 21(1), 79–93. doi:10.1002/sim.943

Nattino G, Finazzi S, Bertolini G (2014). "A New Calibration Test and a Reappraisal of the Calibration Belt for the Assessment of Prediction Models Based on Dichotomous Outcomes." Statistics in Medicine, 33(14), 2390–2407. doi:10.1002/sim.6100

Zhang J, Ding J, Yang Y (2021). "Is a Classification Procedure Good Enough? A Goodness-of-Fit Assessment Tool for Classification Learning." Journal of the American Statistical Association. doi:10.1080/01621459.2021.1979010

Examples

set.seed(1)
n <- 500
x <- runif(n, -3, 3)
y <- rbinom(n, 1, 1 / (1 + exp(-(0.6 * x))))
fit <- glm(y ~ x, family = binomial())

## quick run: the fast tests only
run.all.gof(fit, include_slow = FALSE)

## pick specific tests
run.all.gof(fit, tests = c("McCullagh", "Osius-Rojek", "HL"))


## the full battery (default include_slow = TRUE); the slow tests may need the
## suggested packages mgcv, BAGofT, givitiR and callr. In an interactive
## session run.all.gof() offers to install any that are missing (install =
## "ask"); see also gof_install_suggests(). Use install = "no" to never ask.
run.all.gof(fit, install = "no", control = list("Stute-Zhu" = list(B = 50)))

## draw the GiViTI calibration belt (needs givitiR + callr)
res <- run.all.gof(fit, tests = c("McCullagh", "GiViTI"),
                   calibration_plot = TRUE)
plot(res)   # redraw the stored belt

## run the slow bootstrap loops (Stute-Zhu, Lai-Liu-HL) on a PSOCK cluster
set.seed(1)
run.all.gof(fit, tests = "Stute-Zhu", parallel = TRUE, ncores = 2,
            control = list("Stute-Zhu" = list(B = 50)))

VPSPulse Mirrors

Package {ebrahim.gof}

ebrahim.gof: Goodness-of-Fit and Calibration Tests for Logistic Regression

Description

The author's own tests

Aggregated tests (for comparison)

Data

Author(s)

See Also

Covariate-Space Directed Ebrahim-Farrington (CDEF) Goodness-of-Fit Test

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Combine Directed GOF Tests into One Decision (Ensemble)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Directed Ebrahim-Farrington (DEF) Goodness-of-Fit Test

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Deployable learned-ensemble GOF test via parametric bootstrap

Description

Usage

Arguments

Value

See Also

EDGE: Directed Goodness-of-Fit Test for Binary Logistic Regression

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

EDGES: Cauchy-Combination Ensemble of Directed GOF Tests (Alias)

Description

Usage

Arguments

Value

Author(s)

References

See Also

Examples

Ebrahim-Farrington Goodness-of-Fit Test for Logistic Regression

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Goodness-of-fit evidence features for a fitted model

Description

Usage

Arguments

Value

See Also

Synthetic binary outcome data with a smooth calibration misfit

Grouped-covariate companion to `gof_demo` (replicated covariate patterns)

Install the optional packages used by `run.all.gof()`