Title: | Relative Risk Regression Using the Log-Binomial Model |
---|---|
Description: | Methods for fitting log-link GLMs and GAMs to binomial data, including EM-type algorithms with more stable convergence properties than standard methods. |
Authors: | Mark W. Donoghoe [aut, cre] , Ian C. Marschner [ths] , Alexandra C. Gillett [ctb] (wrote an initial version of the nplbin function, <https://orcid.org/0000-0002-5069-3197>) |
Maintainer: | Mark W. Donoghoe <[email protected]> |
License: | GPL (>= 2) |
Version: | 2.0.5.9000 |
Built: | 2024-11-22 03:05:04 UTC |
Source: | https://github.com/mdonoghoe/logbin |
Methods for fitting log-link GLMs and GAMs to binomial data, including EM-type algorithms with more stable convergence properties than standard methods.
Package: | logbin |
Type: | Package |
Version: | 2.0.5.9000 |
License: | GPL (>= 2) |
This package provides methods to fit generalised linear models (GLMs) and generalised additive
models (GAMs) with log link functions to binomial data, which can be used to
estimate adjusted relative risks. It has two primary functions: logbin
and logbin.smooth
, together with various supporting functions.
Standard GLM routines such as base R's glm
typically use a modified Fisher
scoring algorithm, but this can experience numerical problems and fail to converge to the
maximum likelihood estimate (MLE). The glm2
package improves on this but can still have difficulties, particularly when the MLE
is on or near the boundary of the parameter space (Marschner, 2015).
Alternative methods for finding the MLE are provided in this package. For both GLMs and GAMs, two approaches based on the EM algorithm can be used: a combinatorial EM (CEM) algorithm (Marschner, 2014) or an expanded EM algorithm. These accomodate the parameter constraints and are more stable than iteratively reweighted least squares.
In a CEM algorithm, a collection of restricted parameter spaces is defined which covers the full parameter space, and an EM algorithm is applied within each restricted parameter space in order to find a collection of restricted maxima of the log-likelihood function, from which can be obtained the global maximum over the full parameter space. The methodology implemented for this algorithm is presented in Marschner and Gillett (2012) and Donoghoe and Marschner (2015).
In the expanded EM approach, additional parameters are added to the model, and an EM algorithm finds the MLE of this overparameterised model by imposing constraints on each individual parameter. This requires a single application of the EM algorithm.
In each case, the EM algorithm may be accelerated by using the capabilities of the turboEM package.
For GLMs, an adaptive barrier approach, which uses a constrained optimisation algorithm, is also provided.
Mark W. Donoghoe [email protected]
Maintainer: Mark W. Donoghoe [email protected]
Donoghoe, M. W. and I. C. Marschner (2015). Flexible regression models for rate differences, risk differences and relative risks. International Journal of Biostatistics 11(1): 91–108.
Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1–22.
Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921–940.
Marschner, I. C. (2015). Relative risk regression for binary outcomes: Methods and recommendations. Australian & New Zealand Journal of Statistics. In press.
Marschner, I. C. and A. C. Gillett (2012). Relative risk regression: Reliable and flexible methods for log-binomial models. Biostatistics 13(1): 179–192.
## For examples, see example(logbin) and example(logbin.smooth)
## For examples, see example(logbin) and example(logbin.smooth)
Compute an analysis of deviance table for more than one GLM
fitted using logbin
.
## S3 method for class 'logbin' anova(object, ..., test = NULL)
## S3 method for class 'logbin' anova(object, ..., test = NULL)
object , ...
|
objects of class |
test |
a character string, (partially) matching one
of |
Unlike anova.glm
, specifying a single object
is not allowed.
The table has a row for the residual degrees of freedom and deviance for each model. For all but the first model, the change in degrees of freedom and deviance is also given. (This only makes statistical sense if the models are nested.) It is conventional to list the models from smallest to largest, but this is up to the user.
Models where the MLE lies on the boundary of the parameter space will be automatically removed from the list (with a warning), because asymptotic results to not apply to such models.
The table will optionally contain test statistics (and p-values)
comparing the reduction in deviance for the row to
the residuals. Mallows' Cp statistic is the residual
deviance plus twice the estimate of times
the residual degrees of freedom, which is closely related
to AIC. You can also choose
"LRT"
and "Rao"
for likelihood ratio tests and Rao's efficient score test.
The former is synonymous with "Chisq"
(although both
have an asymptotic chi-square distribution).
An object of class "anova"
inheriting from class
"data.frame"
.
Mark W. Donoghoe [email protected]
## For an example, see example(logbin)
## For an example, see example(logbin)
Function used in the definition of smooth terms within logbin.smooth
model formulae. The function does not evaluate a smooth — it exists purely to help
set up a model using smooths.
B(..., knots = NULL, knot.range = 0:5) Iso(...)
B(..., knots = NULL, knot.range = 0:5) Iso(...)
... |
variable that this smooth is a function of. Note that unlike |
knots |
unique positions of interior knots of a B-spline basis. Boundary knots are created automatically. |
knot.range |
if At least one of |
The function does not evaluate the variable arguments; the output from this function is used when producing the model matrix, at which point the actual basis functions are constructed.
B
is used to specify an order-3 B-spline basis (which can be
restricted to be monotonically non-decreasing via the
mono
argument in logbin.smooth
). If
length(knot.range) > 1
, models with each of the
specified number of interior knots will be fit, and the model
with the best (smallest) aic.c
will be returned.
Iso
is used to specify an isotonic basis, designed
such that the resulting function has non-negative
increments at each observed covariate value. When
Iso
is used, the resulting function will always be
monotonically non-decreasing, regardless of the value of
mono
.
An object of class "B.smooth"
(for B
) or
"Iso.smooth"
(for Iso
), which is a list with
the following elements:
term |
name of the term
provided in the |
termlabel |
label for the term in the model; e.g. for
term |
knots |
vector of interior knots (if
specified). |
knot.range |
vector of number of interior knots.
|
Mark W. Donoghoe [email protected]
s
performs a similar function in the mgcv
package.
## See example(logbin.smooth) for an example of specifying smooths in model ## formulae.
## See example(logbin.smooth) for an example of specifying smooths in model ## formulae.
Computes confidence intervals for one or more parameters in
a fitted logbin
model.
## S3 method for class 'logbin' confint(object, parm, level = 0.95, ...)
## S3 method for class 'logbin' confint(object, parm, level = 0.95, ...)
object |
a fitted model object, resulting from a
call to |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
... |
additional argument(s) passed to |
Calculates confidence intervals for model parameters assuming
asymptotic normality and using the result from vcov.logbin(object)
.
As such, if the MLE is on the boundary of the parameter space,
(as per object$boundary
) the normality assumption
is invalid and NA
is returned.
A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1-(1-level)/2 in % (by default 2.5% and 97.5%).
Mark W. Donoghoe [email protected]
## For an example, see example(logbin)
## For an example, see example(logbin)
Return something similar to a contrast matrix for a categorical covariate that we wish to be monotonically non-decreasing in a specified order.
contr.isotonic.rev(n, perm, contrasts = TRUE, sparse = FALSE)
contr.isotonic.rev(n, perm, contrasts = TRUE, sparse = FALSE)
n |
a vector of levels for a factor, or the number of levels. |
perm |
a permutation of the levels of |
contrasts |
a logical indicating whether constrasts should be computed. |
sparse |
included for compatibility reasons. Has no effect. |
This function is used in creating the design matrix for categorical covariates with a specified order under a particular parameterisation. This is required if a categorical covariate is defined as monotonic.
In the order specified by perm
, the coefficient
associated with each level is the sum of increments between
the following levels. That is, if there are a total of
levels, the first level is defined as
,
the second as
,
the third as
, and so on. In fitting the model,
these increments are constrained to be non-positive.
Note that these are not ‘contrasts’ as defined in the
theory for linear models, rather this is used to define the
contrasts
attribute of each variable so that
model.matrix
produces the desired design
matrix.
A matrix with n
rows and columns, with
if
contrasts
is TRUE
and
if
contrasts
is FALSE
.
Mark W. Donoghoe [email protected]
model.matrix
, which uses
contr.isotonic.rev
to create the design matrix.
contr.treatment
, contrasts
for
their usual use in regression models.
contr.isotonic.rev(4,1:4) contr.isotonic.rev(4,c(1,3,2,4)) # Show how contr.isotonic.rev applies within model.matrix x <- factor(round(runif(20,0,2))) mf <- model.frame(~x) contrasts(x) <- contr.isotonic.rev(levels(x), levels(x)) model.matrix(mf)
contr.isotonic.rev(4,1:4) contr.isotonic.rev(4,c(1,3,2,4)) # Show how contr.isotonic.rev applies within model.matrix x <- factor(round(runif(20,0,2))) mf <- model.frame(~x) contrasts(x) <- contr.isotonic.rev(levels(x), levels(x)) model.matrix(mf)
This is an internal function of package logbin
.
It is a service routine for logbin.smooth
which interprets the
smooth parts of the model formula and returns modified formulas to be used
in the fitting functions.
Not normally called directly.
interpret.logbin.smooth(formula)
interpret.logbin.smooth(formula)
formula |
A formula as supplied to |
A list with components:
full.formula |
a |
fake.formula |
a |
smooth.spec |
a named list containing the results of |
smooth.ind |
a vector containing the indices of the smooth components in
the |
terms |
the result of running |
Mark W. Donoghoe [email protected]
# Specify a smooth model with knot.range res <- interpret.logbin.smooth(y ~ B(x, knot.range = 0:2) + x2) # The knot.range is removed from the full.formula... print(res$full.formula) # ...but is stored in the $smooth.spec component of the result: print(res$smooth.spec$x$knot.range)
# Specify a smooth model with knot.range res <- interpret.logbin.smooth(y ~ B(x, knot.range = 0:2) + x2) # The knot.range is removed from the full.formula... print(res$full.formula) # ...but is stored in the $smooth.spec component of the result: print(res$smooth.spec$x$knot.range)
logbin
fits relative risk (log-link) binomial
regression models.
logbin(formula, mono = NULL, data, subset, na.action, start = NULL, offset, control = list(...), model = TRUE, method = c("cem", "em", "glm", "glm2", "ab"), accelerate = c("em", "squarem", "pem", "qn"), control.method = list(), warn = TRUE, ...)
logbin(formula, mono = NULL, data, subset, na.action, start = NULL, offset, control = list(...), model = TRUE, method = c("cem", "em", "glm", "glm2", "ab"), accelerate = c("em", "squarem", "pem", "qn"), control.method = list(), warn = TRUE, ...)
formula |
an object of class |
mono |
a vector indicating which terms in
|
data |
an optional data frame, list or environment
(or object coercible by |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data
contain |
start |
starting values for the parameters in the linear predictor. |
offset |
this can be used to specify an a
priori known component to be included in the linear
predictor during fitting. This should be |
control |
a list of parameters for controlling the
fitting process, passed to
With |
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
method |
a character string that determines which algorithm to use
to find the MLE. The main purpose of
|
accelerate |
for the |
control.method |
a list of control parameters for the fitting algorithm. This is passed to the If If any items are not specified, the defaults are used. |
warn |
a logical indicating whether or not warnings should be provided for non-convergence or boundary values. |
... |
arguments to be used to form the default
|
logbin
fits a generalised linear model (GLM) with a
binomial error distribution and log link
function. Predictors are assumed to be continuous, unless
they are of class factor
, or are character or
logical (in which case they are converted to
factor
s). Specifying a predictor as monotonic using
the mono
argument means that for continuous terms,
the associated coefficient will be restricted to be
non-negative, and for categorical terms, the coefficients
will be non-decreasing in the order of the factor
levels
. This allows semi-parametric monotonic regression
functions, in the form of unsmoothed step-functions. For
smooth regression functions see logbin.smooth
.
As well as allowing monotonicity constraints, the function
is useful when a standard GLM routine, such as
glm
, fails to converge with a log-link
binomial model. For convenience in comparing convergence on
the same model, logbin
can be used
as a wrapper function to glm
and glm2
through the method
argument.
If glm
does achieve successful convergence,
and logbin
converges to an interior point, then the two
results will be identical. However, as illustrated in one of
the examples below, glm
may still experience convergence
problems even when logbin
converges to an interior point.
Note that if logbin
converges to a boundary point, then it
may differ slightly from glm
even if glm
successfully
converges, because of differences in the definition of the parameter
space. logbin
produces valid fitted values for covariate
values within the Cartesian product of the observed range of covariate
values, whereas glm
produces valid fitted values just
for the observed covariate combinations (assuming it successfully
converges). This issue is only relevant when logbin
converges to a boundary point. The adaptive barrier approach defines
the parameter space in the same way as glm
, so the
same comments apply when comparing its results to those from
method = "cem"
or "em"
.
The main computational method is an EM-type algorithm which accommodates
the parameter contraints in the model and is more stable than iteratively
reweighted least squares. This is done in one of two ways,
depending on the choice of the method
argument.
method = "cem"
implements a CEM algorithm (Marschner, 2014),
in which a collection of restricted parameter spaces is defined
that covers the full parameter space, and an EM algorithm is applied within each
restricted parameter space in order to find a collection of
restricted maxima of the log-likelihood function, from
which can be obtained the global maximum over the full
parameter space. See Marschner and Gillett (2012) for further
details.
method = "em"
implements a single EM algorithm
on an overparameterised model, and the MLE of this model
is transformed back to the original parameter space.
Acceleration of the EM algorithm in either case can be
achieved through the methods of the turboem
package, specified through the accelerate
argument. However,
note that these methods do not have the guaranteed convergence of
the standard EM algorithm, particularly when the MLE is on the
boundary of its (possibly constrained) parameter space.
Alternatively, an adaptive barrier method can be used by specifying
method = "ab"
, which maximises the likelihood subject to
constraints on the fitted values.
logbin
returns an object of class "logbin"
,
which inherits from classes "glm"
and "lm"
.
The function summary.logbin
can be used
to obtain or print a summary of the results.
The generic accessor functions coefficients
,
fitted.values
and residuals
can be used to
extract various useful features of the value returned by
logbin
. Note that effects
will not work.
An object of class "logbin"
is a list containing the
same components as an object of class "glm"
(see the
"Value" section of glm
). It also includes:
loglik |
the maximised log-likelihood. |
aic.c |
a small-sample corrected
version of Akaike's An Information Criterion
(Hurvich, Simonoff and Tsai, 1998). This is used by
|
xminmax |
the minimum and maximum observed values for each of the continuous covariates, to help define the covariate space of the model. |
As well as:
np.coefficients |
estimated coefficients associated with the non-positive parameterisation corresponding to the MLE. |
nn.x |
non-negative model matrix associated with
|
coefhist |
(if |
Due to the way in which the covariate space is defined in the CEM algorithm,
models that include terms that are functionally dependent on one another
— such as interactions and polynomials — may give unexpected
results. Categorical covariates should always be entered directly
as factors rather than dummy variables. 2-way interactions between
factors can be included by calculating a new factor term that
has levels corresponding to all possible combinations of the factor
levels (see the Example). Non-linear relationships can be included
by using logbin.smooth
.
Mark W. Donoghoe [email protected]
Hurvich, C. M., J. S. Simonoff and C.-L. Tsai (1998). Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60(2): 271–293.
Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1–22.
Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921–940.
Marschner, I. C. and A. C. Gillett (2012). Relative risk regression: reliable and flexible methods for log-binomial models. Biostatistics 13(1): 179–192.
logbin.smooth
for semi-parametric models
turboem
for acceleration methods
constrOptim
for the adaptive barrier approach.
require(glm2) data(heart) #====================================================== # Model with periodic non-convergence when glm is used #====================================================== start.p <- sum(heart$Deaths) / sum(heart$Patients) fit.glm <- glm(cbind(Deaths, Patients-Deaths) ~ factor(AgeGroup) + factor(Severity) + factor(Delay) + factor(Region), family = binomial(log), start = c(log(start.p), rep(c(0.2, 0.4), 4)), data = heart, trace = TRUE, maxit = 100) fit.logbin <- logbin(formula(fit.glm), data = heart, start = c(log(start.p), rep(c(0.2, 0.4), 4)), trace = 1) summary(fit.logbin) # Speed up convergence by using single EM algorithm fit.logbin.em <- update(fit.logbin, method = "em") # Speed up convergence by using acceleration methods fit.logbin.acc <- update(fit.logbin, accelerate = "squarem") fit.logbin.em.acc <- update(fit.logbin.em, accelerate = "squarem") #============================= # Model with interaction term #============================= heart$AgeSev <- 10 * heart$AgeGroup + heart$Severity fit.logbin.int <- logbin(cbind(Deaths, Patients-Deaths) ~ factor(AgeSev) + factor(Delay) + factor(Region), data = heart, trace = 1, maxit = 100000) summary(fit.logbin.int) vcov(fit.logbin.int) confint(fit.logbin.int) summary(predict(fit.logbin.int, type = "response")) anova(fit.logbin, fit.logbin.int, test = "Chisq")
require(glm2) data(heart) #====================================================== # Model with periodic non-convergence when glm is used #====================================================== start.p <- sum(heart$Deaths) / sum(heart$Patients) fit.glm <- glm(cbind(Deaths, Patients-Deaths) ~ factor(AgeGroup) + factor(Severity) + factor(Delay) + factor(Region), family = binomial(log), start = c(log(start.p), rep(c(0.2, 0.4), 4)), data = heart, trace = TRUE, maxit = 100) fit.logbin <- logbin(formula(fit.glm), data = heart, start = c(log(start.p), rep(c(0.2, 0.4), 4)), trace = 1) summary(fit.logbin) # Speed up convergence by using single EM algorithm fit.logbin.em <- update(fit.logbin, method = "em") # Speed up convergence by using acceleration methods fit.logbin.acc <- update(fit.logbin, accelerate = "squarem") fit.logbin.em.acc <- update(fit.logbin.em, accelerate = "squarem") #============================= # Model with interaction term #============================= heart$AgeSev <- 10 * heart$AgeGroup + heart$Severity fit.logbin.int <- logbin(cbind(Deaths, Patients-Deaths) ~ factor(AgeSev) + factor(Delay) + factor(Region), data = heart, trace = 1, maxit = 100000) summary(fit.logbin.int) vcov(fit.logbin.int) confint(fit.logbin.int) summary(predict(fit.logbin.int, type = "response")) anova(fit.logbin, fit.logbin.int, test = "Chisq")
Auxiliary function for logbin
fitting.
Typically only used internally by nplbin
,
but may be used to construct a
control
argument to that function.
logbin.control(bound.tol = 1e-06, epsilon = 1e-08, maxit = 10000, trace = 0, coeftrace = FALSE)
logbin.control(bound.tol = 1e-06, epsilon = 1e-08, maxit = 10000, trace = 0, coeftrace = FALSE)
bound.tol |
positive tolerance specifying the
interior of the parameter space. If the fitted model is
more than |
epsilon |
positive convergence tolerance
This should be smaller than |
maxit |
integer giving the maximum number of iterations (for a given parameterisation in the case of the CEM algorithm). |
trace |
number indicating level of output that should be produced. >= 1 gives output for each parameterisation, >= 2 gives output at each iteration. |
coeftrace |
logical indicating whether the coefficient history
should be included as a component of the returned
value (for |
This is used similarly to glm.control
. The
control
argument of logbin
is by
default passed to the control
argument of
nplbin
.
When trace
is greater than zero, calls to
cat
produce the output. Hence,
options(digits = *)
can be used to increase
the precision.
A list with components named as the arguments.
Mark W. Donoghoe [email protected]
glm.control
, the equivalent function for
glm
fitting.
nplbin
, the
function used to fit logbin
models.
## Variation on example(glm.control) : evts <- c(18,17,15,20,10,20,25,13,12) obs <- rep(30,9) outcome <- gl(3,1,9) treatment <- gl(3,3) oo <- options(digits = 12) logbin.D93X <- logbin(cbind(evts,obs-evts) ~ outcome + treatment, trace = 2, epsilon = 1e-2) options(oo) coef(logbin.D93X)
## Variation on example(glm.control) : evts <- c(18,17,15,20,10,20,25,13,12) obs <- rep(30,9) outcome <- gl(3,1,9) treatment <- gl(3,3) oo <- options(digits = 12) logbin.D93X <- logbin(cbind(evts,obs-evts) ~ outcome + treatment, trace = 2, epsilon = 1e-2) options(oo) coef(logbin.D93X)
logbin.smooth
fits log-link binomial
regression models using a stable CEM algorithm. It provides additional
flexibility over logbin
by allowing for smooth
semi-parametric terms.
logbin.smooth(formula, mono = NULL, data, subset, na.action, offset, control = list(...), model = TRUE, model.logbin = FALSE, method = c("cem", "em"), accelerate = c("em", "squarem", "pem", "qn"), control.accelerate = list(), ...)
logbin.smooth(formula, mono = NULL, data, subset, na.action, offset, control = list(...), model = TRUE, model.logbin = FALSE, method = c("cem", "em"), accelerate = c("em", "squarem", "pem", "qn"), control.accelerate = list(), ...)
formula |
an object of class |
mono |
a vector indicating which terms in
|
data |
an optional data frame, list or environment
(or object coercible by |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function which indicates what should happen when the data
contain |
offset |
this can be used to specify an a
priori known component to be included in the linear
predictor during fitting. This should be |
control |
a list of parameters for controlling the
fitting process, passed to
|
model |
a logical value indicating whether the model frame should be included as a component of the returned value. |
model.logbin |
a logical value indicating whether the fitted |
method |
a character string that determines which EM-type algorithm to use
to find the MLE: Unlike |
accelerate |
a character string that determines the acceleration
algorithm to be used, (partially) matching one of |
control.accelerate |
a list of control parameters for the acceleration algorithm. See |
... |
arguments to be used to form the default
|
logbin.smooth
performs the same fitting process as logbin
,
providing a stable maximum likelihood estimation procedure for log-link
binomial GLMs, with the added flexibility of allowing semi-parametric
B
and Iso
terms (note that logbin.smooth
will stop with an
error if no semi-parametric terms are specified in the right-hand side of the formula
;
logbin
should be used instead).
The method partitions the parameter space associated with the semi-parametric part of the
model into a sequence of constrained parameter spaces, and defines a fully parametric
logbin
model for each. The model with the highest log-likelihood is the MLE for
the semi-parametric model (see Donoghoe and Marschner, 2015).
An object of class "logbin.smooth"
, which contains the same objects as class
"logbin"
(the same as "glm"
), as well as:
model.logbin |
if |
xminmax.smooth |
the minimum and maximum observed values for each of the smooth terms in the model, to help define the covariate space. |
full.formula |
the component from |
knots |
a named list containing the knot vectors for each of the smooth terms in the model. |
Mark W. Donoghoe [email protected]
Donoghoe, M. W. and I. C. Marschner (2015). Flexible regression models for rate differences, risk differences and relative risks. International Journal of Biostatistics 11(1): 91–108.
Donoghoe, M. W. and I. C. Marschner (2018). logbin: An R package for relative risk regression using the log-binomial model. Journal of Statistical Software 86(9): 1–22.
Marschner, I. C. (2014). Combinatorial EM algorithms. Statistics and Computing 24(6): 921–940.
## Simple example x <- c(0.3, 0.2, 0.0, 0.1, 0.2, 0.1, 0.7, 0.2, 1.0, 0.9) y <- c(5, 4, 6, 4, 7, 3, 6, 5, 9, 8) system.time(m1 <- logbin.smooth(cbind(y, 10-y) ~ B(x, knot.range = 0:2), mono = 1, trace = 1)) ## Compare with accelerated version system.time(m1.acc <- update(m1, accelerate = "squarem")) ## Isotonic relationship m2 <- logbin.smooth(cbind(y, 10-y) ~ Iso(x)) plot(m1) plot(m2) summary(predict(m1, type = "response")) summary(predict(m2, type = "response"))
## Simple example x <- c(0.3, 0.2, 0.0, 0.1, 0.2, 0.1, 0.7, 0.2, 1.0, 0.9) y <- c(5, 4, 6, 4, 7, 3, 6, 5, 9, 8) system.time(m1 <- logbin.smooth(cbind(y, 10-y) ~ B(x, knot.range = 0:2), mono = 1, trace = 1)) ## Compare with accelerated version system.time(m1.acc <- update(m1, accelerate = "squarem")) ## Isotonic relationship m2 <- logbin.smooth(cbind(y, 10-y) ~ Iso(x)) plot(m1) plot(m2) summary(predict(m1, type = "response")) summary(predict(m2, type = "response"))
Finds the maximum likelihood estimate of a log-link binomial GLM using an EM algorithm, where each of the coefficients in the linear predictor is restricted to be non-positive.
nplbin(y, x, offset, start, Amat = diag(ncol(x)), control = logbin.control(), accelerate = c("em", "squarem", "pem", "qn"), control.accelerate = list(list()))
nplbin(y, x, offset, start, Amat = diag(ncol(x)), control = logbin.control(), accelerate = c("em", "squarem", "pem", "qn"), control.accelerate = list(list()))
y |
binomial response. May be a single column of 0/1 or two columns, giving the number of successes and failures. |
x |
non-negative covariate matrix. |
offset |
non-positive additive offset vector. The default is a vector of zeros. |
start |
starting values for the parameter estimates. All elements must be less than
or equal to |
Amat |
matrix that parameter estimates are left-multiplied by before testing for convergence (e.g. to check reduced version of expanded parameter vector). |
control |
a |
accelerate |
a character string that determines the acceleration
algorithm to be used, (partially) matching one of |
control.accelerate |
a list of control parameters for the acceleration algorithm. See |
This is a workhorse function for logbin
, and runs the EM algorithm to find the
constrained non-positive MLE associated with a log-link binomial GLM. See Marschner
and Gillett (2012) for full details.
A list containing the following components
coefficients |
the constrained non-positive maximum likelihood estimate of the parameters. |
residuals |
the residuals at the MLE, that is |
fitted.values |
the fitted mean values. |
rank |
the number of parameters in the model (named " |
family |
included for compatibility — will always be |
linear.predictors |
the linear fit on link scale. |
deviance |
up to a constant, minus twice the maximised log-likelihood. |
aic |
a version of Akaike's An Information Criterion, minus twice the maximised log-likelihood plus twice the number of parameters. |
aic.c |
a small-sample corrected version of Akaike's An Information Criterion (Hurvich, Simonoff and Tsai, 1998). |
null.deviance |
the deviance for the null model, comparable with |
iter |
the number of iterations of the EM algorithm used. |
weights |
included for compatibility — a vector of ones. |
prior.weights |
the number of trials associated with each binomial response. |
df.residual |
the residual degrees of freedom. |
df.null |
the residual degrees of freedom for the null model. |
y |
the |
converged |
logical. Did the EM algorithm converge? |
boundary |
logical. Is the MLE on the boundary of the parameter
space — i.e. are any of the |
loglik |
the maximised log-likelihood. |
nn.design |
the non-negative |
Mark W. Donoghoe [email protected].
This function is based on code from Marschner and Gillett (2012) written by Alexandra Gillett.
Hurvich, C. M., J. S. Simonoff and C.-L. Tsai (1998). Smoothing parameter selection in non-parametric regression using an improved Akaike information criterion. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 60(2): 271–293.
Marschner, I. C. and A. C. Gillett (2012). Relative risk regression: reliable and flexible methods for log-binomial models. Biostatistics 13(1): 179–192.
The main use is to take a fitted logbin.smooth
object produced by
logbin.smooth
and plot the component smooth functions that make it up,
for specified values of the other covariates.
Alternatively, plots the model diagnostics usually provided by plot.lm
.
## S3 method for class 'logbin.smooth' plot(x, type = c("response", "link", "diagnostics"), at = data.frame(), knotlines = TRUE, nobs = 1000, ...)
## S3 method for class 'logbin.smooth' plot(x, type = c("response", "link", "diagnostics"), at = data.frame(), knotlines = TRUE, nobs = 1000, ...)
x |
a fitted |
type |
for for |
at |
a data frame containing the values at which the prediction should be evaluated. The columns
must contain the covariates in the model, and several rows may be provided (in which case, multiple
lines are drawn on the same plot). Cannot be missing or |
knotlines |
logical; if vertical lines should be drawn on the plot to indicate the locations of the knots for B-spline terms. |
nobs |
the number of points which should be used to create the curve. These are placed evenly along the range of the observed covariate values from the original model. |
... |
other graphics parameters to pass on to plotting commands, in particular any arguments to
|
For each smooth covariate in the model of x
, predict.logbin.smooth
is used to obtain predicted values for the range of that covariate, with the other
covariates remaining fixed at their values given in at
. Several rows may be provided
in at
, in which case, one curve is drawn for each, and they are coloured using
rainbow(nrow(at))
. If the model contains a single smooth covariate and no other
covariates, at
may be provided as an empty data frame, data.frame()
.
The function simply generates plots.
If this function is too restrictive, it may be easier to use predict.logbin.smooth
to get predictions for the dataset of your choice, and do the plotting manually.
Mark W. Donoghoe [email protected]
logbin.smooth
, predict.logbin.smooth
## For an example, see example(logbin.smooth)
## For an example, see example(logbin.smooth)
Obtains predictions from a fitted logbin
object.
## S3 method for class 'logbin' predict(object, newdata = NULL, type = c("link", "response", "terms"), terms = NULL, na.action = na.pass, checkminmax = TRUE, ...)
## S3 method for class 'logbin' predict(object, newdata = NULL, type = c("link", "response", "terms"), terms = NULL, na.action = na.pass, checkminmax = TRUE, ...)
object |
a fitted object of class inheriting from |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
the type of prediction required. The default is on the scale of the linear predictors;
the alternative The value of this argument can be abbreviated. |
terms |
with |
na.action |
function determining what should be done with missing values in |
checkminmax |
logical indicating whether or not values of continuous covariates in |
... |
further arguments passed to or from other methods. |
If newdata
is omitted the predictions are based on the data used for the fit.
In that case how cases with missing values in the original fit are treated is determined by the
na.action
argument of that fit. If na.action = na.omit
, omitted cases
will not appear in the residuals. If na.action = na.exclude
they will
appear, with residual value NA
. See also napredict
.
A vector or matrix of predictions. For type = "terms"
, this is a matrix with
a column per term, and may have an attribute "constant"
.
Variables are first looked for in newdata
and then searched for in the usual
way (which will include the environment of the formula used in the fit). A warning
will be given if the variables found are not of the same length as those in
newdata
if it was supplied.
Mark W. Donoghoe [email protected]
predict.glm
for the equivalent method for models fit using glm
.
## For an example, see example(logbin)
## For an example, see example(logbin)
Obtains predictions from a fitted logbin.smooth
object.
## S3 method for class 'logbin.smooth' predict(object, newdata = NULL, type = c("link", "response", "terms"), terms = NULL, na.action = na.pass, ...)
## S3 method for class 'logbin.smooth' predict(object, newdata = NULL, type = c("link", "response", "terms"), terms = NULL, na.action = na.pass, ...)
object |
a fitted object of class inheriting from |
newdata |
optionally, a data frame in which to look for variables with which to predict. If omitted, the fitted linear predictors are used. |
type |
the type of prediction required. The default is on the scale of the linear predictors;
the alternative The value of this argument can be abbreviated. |
terms |
with |
na.action |
function determining what should be done with missing values in |
... |
further arguments passed to or from other methods. |
predict.logbin.smooth
constructs the underlying basis functions for smooth variables
in newdata
and runs predict.logbin
to obtain predictions. Note that
if values of smooth covariates in newdata
are outside the covariate space of
object
, an error will be returned.
If newdata
is omitted, the predictions are based on the data used for the fit.
In that case how cases with missing values in the original fit are treated is determined by the
na.action
argument of that fit. If na.action = na.omit
, omitted cases
will not appear in the residuals, whereas if na.action = na.exclude
they will
appear, with residual value NA
. See also napredict
.
A vector or matrix of predictions. For type = "terms"
, this is a matrix with
a column per term, and may have an attribute "constant"
.
Variables are first looked for in newdata
and then searched for in the usual
way (which will include the environment of the formula used in the fit). A warning
will be given if the variables found are not of the same length as those in
newdata
if it was supplied.
Mark W. Donoghoe [email protected]
predict.glm
for the equivalent method for models fit using glm
.
## For an example, see example(logbin.smooth)
## For an example, see example(logbin.smooth)
These functions are all methods
for class logbin
or summary.logbin
objects.
## S3 method for class 'logbin' summary(object, correlation = FALSE, ...) ## S3 method for class 'summary.logbin' print(x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ...)
## S3 method for class 'logbin' summary(object, correlation = FALSE, ...) ## S3 method for class 'summary.logbin' print(x, digits = max(3L, getOption("digits") - 3L), signif.stars = getOption("show.signif.stars"), ...)
object |
an object of class |
x |
an object of class |
correlation |
logical; if |
digits |
the number of significant digits to use when printing. |
signif.stars |
logical; if |
... |
further arguments passed to or from other methods. |
These perform the same function as summary.glm
and print.summary.glm
,
producing similar results for logbin
models. print.summary.logbin
additionally prints
the small-sample corrected AIC (aic.c
), and the number of EM iterations for the parameterisation
corresponding to the MLE.
The dispersion used in calculating standard errors is fixed as 1
.
summary.logbin
returns an object of class "summary.logbin"
, a list with components
call |
the component from |
family |
the component from |
deviance |
the component from |
aic |
the component from |
aic.c |
the component from |
df.residual |
the component from |
null.deviance |
the component from |
df.null |
the component from |
iter |
the component from |
deviance.resid |
the deviance residuals: see |
coefficients |
the matrix of coefficients, standard errors, z-values and p-values. |
aliased |
included for compatibility — always |
dispersion |
the inferred/estimated dispersion. |
df |
included for compatibility — a 3-vector of the number of coefficients, the number of residual degrees of freedom, and the number of coefficients (again). |
cov.unscaled |
the unscaled ( |
cov.scaled |
ditto, scaled by |
correlation |
if |
If object$boundary == TRUE
, the standard errors of the coefficients
are not valid, and a matrix of NaN
s is returned by vcov.logbin
. If
the MLE is not on the boundary but the model contains parameters with monotonicity
constraints, the standard errors do not take this into account and should be used
with caution.
Mark W. Donoghoe [email protected]
## For examples see example(logbin)
## For examples see example(logbin)
Returns the variance-covariance matrix of the main parameters of a fitted logbin
model object.
## S3 method for class 'logbin' vcov(object, ...)
## S3 method for class 'logbin' vcov(object, ...)
object |
an object of class |
... |
additional arguments for method functions. |
An equivalent method to vcov
, to use with logbin
models.
A matrix of the estimated covariances between the parameter estimates in the linear or
non-linear predictor of the model. This should have row and column names corresponding
to the parameter names given by the coef
method.
If object$boundary == TRUE
, the standard errors of the coefficients
are not valid, and a matrix of NaN
s is returned.
Mark W. Donoghoe [email protected]
## For an example see example(logbin)
## For an example see example(logbin)