VPSPulse Mirrors

High-Performance Open-Source Archive

Nonparametric model

Nonparametric model

library(serosv)
library(magrittr)

Local estimation by polynomial

Proposed model

Within the local polynomial framework, the linear predictor \(\eta(a)\) is approximated locally at one particular value \(a_0\) for age by a line (local linear, degree \(p=1\)) or a parabola (local quadratic, degree \(p=2\)).

For a general degree \(p\), the linear predictor for a neighbor of \(a_0\), labeled \(a_i\) is equivalent to the Taylor approximation

\[ \eta(a_i) = \eta(a_0) + \eta^{(1)}(a_0)(a_i - a_0) + \frac{\eta^{(2)}(a_0)}{2}(a_i - a_0)^2 + ... + \frac{\eta^{(p)}(a_0)}{p!}(a_i - a_0)^p \]

\(\eta(a_i)\) can be estimated by maximizing

\[ \Sigma_{i=1}^{N} \ell_i \{Y_i, g^{-1} (\beta_0 + \beta_1(a_i-a_0)+ \beta_2(a_i-a_0)^2 ... + \beta_p(a_i-a_0)^p) \} K_h(a_i - a_0) \]

Where:

The estimator for the \(k\)-th derivative of \(\eta(a_0)\), for \(k = 0,1,…,p\) (degree of local polynomial) is thus

\[ \hat{\eta}^{(k)}(a_0) = k!\hat{\beta}_k(a_0) \]

The estimator for the prevalence at age \(a_0\) is then given by

\[ \hat{\pi}(a_0) = g^{-1}\{ \hat{\beta}_0(a_0) \} \]

The estimator for the force of infection at age \(a_0\) by assuming \(p \ge 1\) is as followed

\[ \hat{\lambda}(a_0) = \hat{\beta}_1(a_0) \delta \{ \hat{\beta}_0 (a_0) \} \]

Refer to Chapter 7.1 of the book by Hens et al. (2012) for a more detailed explanation of the method.

Fitting data

mump <- mumps_uk_1986_1987

To fit a local estimation by polynomials, use lp_model() function with model parameters include:

lp <- lp_model(mump, kern="tcub", nn = 0.5, deg=2)
lp
#> Local polynomial model 
#> 
#> Input type:  aggregated 
#> Configs:  nn=0.5, bandwidth(h)=0, degree=2, kernel=tcub 
#> 
#> Call:
#> locfit(formula = y ~ lp(age, deg = deg, nn = nn, h = h), family = "binomial", 
#>     kern = kern)
#> 
#> Number of observations:          44 
#> Family:  Logistic 
#> Fitted Degrees of freedom:       6.463 
#> Residual scale:                  1
plot(lp)

The users can also provide h or nn as a numeric vector, in which case the package will use the parameter value that minimize the generalized cross-validation (GCV) criterion.

# ----- Tune nearest neighbor (nn) parameter
lp_model(mump, kern="tcub", nn = seq(0.2, 0.8, 0.1), deg=2) %>% 
  plot()


# ----- Tune bandwidth (h) parameter
lp_model(mump, kern="tcub", h = seq(5,25), deg=2) %>% 
  plot()

Alternatively, use plot_gcv() to show GCV curves for the nearest neighbor method (left) and constant bandwidth (right) for manual parameter selection.

plot_gcv(
   mump$age, mump$pos, mump$tot,
   nn_seq = seq(0.2, 0.8, by=0.1),
   h_seq = seq(5, 25, by=1)
 )

Based on the plot, we can then fit the model using the parameter value that give the lowest GCV (nn = 0.3 and h = 14)

lp_model(mump, kern="tcub", nn = 0.3, deg=2) %>% 
  plot()


lp_model(mump, kern="tcub", h = 14, deg=2)  %>% 
  plot()

Hens, Niel, Ziv Shkedy, Marc Aerts, Christel Faes, Pierre Van Damme, and Philippe Beutels. 2012. Modeling Infectious Disease Parameters Based on Serological and Social Contact Data: A Modern Statistical Perspective. Statistics for Biology and Health. Springer New York. https://doi.org/10.1007/978-1-4614-4072-7.

Need mirroring services?
Contact our team at info@vpspulse.com.

Mirror powered by VPSpulse

Infrastructure sponsored by VPSPulse & Secure Payments by ArionPay.