| Title: | Fitting Generalized Linear Models |
|---|---|
| Description: | Fits generalized linear models using the same model specification as glm in the stats package, but with a modified default fitting method that provides greater stability for models that may fail to converge using glm. |
| Authors: | Ian Marschner [aut], Mark W. Donoghoe [cre, aut] (ORCID: <https://orcid.org/0000-0003-0212-6443>), The R Core Team [cph] (functions in this package are modified versions of stats::glm and related functions.) |
| Maintainer: | Mark W. Donoghoe <[email protected]> |
| License: | GPL (>=2) |
| Version: | 1.2.1.9000 |
| Built: | 2026-06-04 09:39:40 UTC |
| Source: | https://github.com/mdonoghoe/glm2 |
Fits generalized linear models using the same model specification as
glm in the stats package, but with a modified default fitting
method. The method provides greater stability for models that may fail to
converge using glm.
| Package: | glm2 |
| Type: | Package |
| Version: | 1.2.1 |
| License: | GPL (>=2) |
| LazyData: | true |
There are two
functions in the package, glm2 and glm.fit2. The glm2
function fits generalized linear models using the same model specification
as glm in the stats package. It is identical to glm
except for minor modifications to change the default fitting method. The
glm.fit2 function provides the default fitting method for
glm2. It is identical to glm.fit in the stats package,
except for modifications to the computational method that provide more
stable convergence. Normally only glm2 would be called directly,
although like glm.fit, glm.fit2 can be called directly. It can
also be passed to glm as an alternative fitting method, using the
method argument. See Marschner (2011) for a discussion of the need
for a modified fitting method.
Ian Marschner (using code from glm and glm.fit in the
stats package)
Maintainer: Mark W. Donoghoe [email protected]
Marschner, I.C. (2011) glm2: Fitting generalized linear models with convergence problems. The R Journal, Vol. 3/2, pp.12-15.
Useful links:
This data set is derived from Agresti (2007, Table 3.2, pp.76-77). It gives
4 variables for each of 173 female horseshoe crabs. Also provided are two
random samples of the data with replacement, which are useful for
illustrating the convergence properties of glm and glm2.
A data frame with 173 observations on the following 6 variables:
number of male partners in addition to the female's primary partner
width of the female in centimeters
a binary factor indicating whether the female has dark coloring
(yes or no)
a binary factor indicating whether the female has
good spine condition (yes or no)
a random sample with replacement from 1:173
a second random sample with replacement from 1:173
The variables Dark and GoodSpine are derived from the raw
data. In the notation of Table 3.2 of Agresti (2007), Dark = yes
corresponds to C>3 and GoodSpine = yes corresponds to S<3. The two
random samples Rep1 and Rep2 can be used to provide random
samples with replacement from the full data set. These two random samples
are useful for illustrating the convergence properties of glm and
glm2; see examples in the help documentation for glm2.
Agresti, A. (2007) An Introduction to Categorical Data Analysis (2nd ed.). Hoboken, NJ: Wiley.
This method is a placeholder. The dfbetas
method for GLMs fit using glm2(..., method = "glm.fit2.Matrix")
is not yet implemented.
An error indicating the method is not implemented.
glm.fit2 is identical to glm.fit in the stats package,
except for a modification to the computational method that provides improved
convergence properties. It is the default fitting method for glm2 and
can also be used as an alternative fitting method for glm, instead of
the default method glm.fit. glm.fit2.Matrix is similar, but
allows for the model matrix x to be supplied as a sparse
Matrix, allowing for greater efficiency with large data.
glm.fit2( x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list(), intercept = TRUE, singular.ok = TRUE ) glm.fit2.Matrix( x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list(), intercept = TRUE, singular.ok = TRUE )glm.fit2( x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list(), intercept = TRUE, singular.ok = TRUE ) glm.fit2.Matrix( x, y, weights = rep(1, nobs), start = NULL, etastart = NULL, mustart = NULL, offset = rep(0, nobs), family = gaussian(), control = list(), intercept = TRUE, singular.ok = TRUE )
x |
|
y |
as for |
weights |
as for |
start |
as for |
etastart |
as for |
mustart |
as for |
offset |
as for |
family |
as for |
control |
as for |
intercept |
as for |
singular.ok |
as for |
glm.fit2 is a modified version of glm.fit in the stats
package. The computational method in glm.fit2 uses a stricter form of
step-halving to deal with numerical instability in the iteratively
reweighted least squares algorithm. Whereas glm.fit uses step-halving
to correct divergence and parameter space violations, glm.fit2
additionally uses step-halving to force the model deviance to decrease at
each iteration, which improves the convergence properties. Like
glm.fit, glm.fit2 usually would not be called directly.
Instead, it is called by glm2 as the default fitting method. Like
glm.fit, it is possible to call glm.fit2 directly if the
response vector and design matrix have already been calculated, in which
case it may be more efficient than calling glm2. It can also be
passed to glm in the stats package, using the method
argument, providing an alternative to the default fitting method
glm.fit. See Marschner (2011) for a discussion of the need for a
modified fitting method.
glm.fit2.Matrix uses the same algorithm as glm.fit2, but
employs QR Factorization methods from the
Matrix package, which are more computationally efficient with large,
sparse model matrices. When calling via glm2, this is generated
using Matrix::sparse.model.matrix.
If you are calling glm.fit2.Matrix directly, ideally you should supply
a Matrix object. This approach will not be as efficient if called using
glm(..., method = "glm.fit2.Matrix").
The value returned by glm.fit2 has exactly the same
documentation as the value returned by glm.fit. The value returned by
glm.fit2.Matrix is slightly different in some details.
glm.fit2 uses the code from glm.fit, whose authors are
listed in the help documentation for the stats package. Modifications
to this code were made by Ian Marschner.
Marschner, I.C. (2011) glm2: Fitting generalized linear models with convergence problems. The R Journal, Vol. 3/2, pp.12-15.
library(glm2) #--- logistic regression null model ---------------# # (intercept estimate = log(0.75/0.25) = 1.098612) y <- c(1,1,1,0) x <- rep(1,4) #--- divergence of glm.fit to infinite estimate ---# fit1 <- glm.fit(x,y, family=binomial(link="logit"),start=-1.81) fit2 <- glm.fit2(x,y, family=binomial(link="logit"),start=-1.81) print.noquote(c(fit1$coef,fit2$coef))library(glm2) #--- logistic regression null model ---------------# # (intercept estimate = log(0.75/0.25) = 1.098612) y <- c(1,1,1,0) x <- rep(1,4) #--- divergence of glm.fit to infinite estimate ---# fit1 <- glm.fit(x,y, family=binomial(link="logit"),start=-1.81) fit2 <- glm.fit2(x,y, family=binomial(link="logit"),start=-1.81) print.noquote(c(fit1$coef,fit2$coef))
Fits generalized linear models using the same model specification as
glm in the stats package, but with a modified default fitting
method. The method provides greater stability for models that may fail to
converge using glm.
glm2( formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit2", x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ... )glm2( formula, family = gaussian, data, weights, subset, na.action, start = NULL, etastart, mustart, offset, control = list(...), model = TRUE, method = "glm.fit2", x = FALSE, y = TRUE, singular.ok = TRUE, contrasts = NULL, ... )
formula |
as for |
family |
as for |
data |
as for |
weights |
as for |
subset |
as for |
na.action |
as for |
start |
as for |
etastart |
as for |
mustart |
as for |
offset |
as for |
control |
as for |
model |
as for |
method |
the method used in fitting the model. The default method
|
x |
as for |
y |
as for |
singular.ok |
as for |
contrasts |
as for |
... |
as for |
glm2 is a modified version of glm in the stats package.
It fits generalized linear models using the same model specification as
glm. It is identical to glm except for minor modifications to
change the default fitting method. The default method uses a stricter form
of step-halving to force the deviance to decrease at each iteration and is
implemented in glm.fit2. Like glm, user-supplied fitting
functions can be used with glm2 by passing a function or a character
string naming a function to the method argument. See Marschner (2011)
for a discussion of the need for a modified fitting method.
The value returned by glm2 has exactly the same documentation
as the value returned by glm, except for:
method |
the name of the fitter function used, which by default is
|
glm2 uses the code from glm, whose authors are listed
in the help documentation for the stats package. Modifications to
this code were made by Ian Marschner.
Marschner, I.C. (2011) glm2: Fitting generalized linear models with convergence problems. The R Journal, Vol. 3/2, pp.12-15.
library(glm2) data(crabs) data(heart) #========================================================== # EXAMPLE 1: logistic regression null model # (behaviour of glm and glm2 for different starting values) #========================================================== y <- c(1,1,1,0) # intercept estimate = log(0.75/0.25) = 1.098612 #--- identical behaviour ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"), control=glm.control(trace=TRUE)) fit2 <- glm2(y ~ 1, family=binomial(link="logit"), control=glm.control(trace=TRUE)) print.noquote(c(fit1$coef,fit2$coef)) #--- convergence via different paths ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"),start=-1.75, control=glm.control(trace=TRUE)) fit2 <- glm2(y ~ 1, family=binomial(link="logit"),start=-1.75, control=glm.control(trace=TRUE)) print.noquote(c(fit1$coef,fit2$coef)) #--- divergence of glm to infinite estimate ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"),start=-1.81) fit2 <- glm2(y ~ 1, family=binomial(link="logit"),start=-1.81) print.noquote(c(fit1$coef,fit2$coef)) #======================================================================= # EXAMPLE 2: identity link Poisson (successful boundary convergence # using 4 identical approaches in glm and glm2 with the method argument) #======================================================================= satellites <- crabs$Satellites width.shifted <- crabs$Width - min(crabs$Width) dark <- crabs$Dark goodspine <- crabs$GoodSpine fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4)) fit1.eq <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), method = "glm.fit") fit2.eq <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), method = "glm.fit2") noquote(c("deviances: ",fit1$dev,fit2$dev,fit1.eq$dev,fit2.eq$dev)) noquote(c("converged: ",fit1$conv,fit2$conv,fit1.eq$conv,fit2.eq$conv)) noquote(c("boundary: ",fit1$bound,fit2$bound,fit1.eq$bound,fit2.eq$bound)) #=================================================================== # EXAMPLE 3: identity link Poisson (periodic non-convergence in glm) #=================================================================== R1 <- crabs$Rep1 satellites <- crabs$Satellites[R1] width.shifted <- crabs$Width[R1] - min(crabs$Width) dark <- crabs$Dark[R1] goodspine <- crabs$GoodSpine[R1] fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv)) #=============================================================== # EXAMPLE 4: log link binomial (periodic non-convergence in glm) #=============================================================== patients <- heart$Patients deaths <- heart$Deaths agegroup <- heart$AgeGroup severity <-heart$Severity delay <- heart$Delay region <- heart$Region start.p <- sum(deaths)/sum(patients) fit1 <- glm(cbind(deaths,patients-deaths) ~ factor(agegroup) + factor(severity) + factor(delay) + factor(region), family = binomial(link="log"), start = c(log(start.p), rep(0,8)), control = glm.control(trace=TRUE,maxit=100)) fit2 <- glm2(cbind(deaths,patients-deaths) ~ factor(agegroup) + factor(severity) + factor(delay) + factor(region), family = binomial(link="log"), start = c(log(start.p), rep(0,8)), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv)) #==================================================================== # EXAMPLE 5: identity link Poisson (aperiodic non-convergence in glm) #==================================================================== R2 <- crabs$Rep2 satellites <- crabs$Satellites[R2] width.shifted <- crabs$Width[R2] - min(crabs$Width) dark <- crabs$Dark[R2] goodspine <- crabs$GoodSpine[R2] fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv))library(glm2) data(crabs) data(heart) #========================================================== # EXAMPLE 1: logistic regression null model # (behaviour of glm and glm2 for different starting values) #========================================================== y <- c(1,1,1,0) # intercept estimate = log(0.75/0.25) = 1.098612 #--- identical behaviour ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"), control=glm.control(trace=TRUE)) fit2 <- glm2(y ~ 1, family=binomial(link="logit"), control=glm.control(trace=TRUE)) print.noquote(c(fit1$coef,fit2$coef)) #--- convergence via different paths ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"),start=-1.75, control=glm.control(trace=TRUE)) fit2 <- glm2(y ~ 1, family=binomial(link="logit"),start=-1.75, control=glm.control(trace=TRUE)) print.noquote(c(fit1$coef,fit2$coef)) #--- divergence of glm to infinite estimate ---# fit1 <- glm(y ~ 1, family=binomial(link="logit"),start=-1.81) fit2 <- glm2(y ~ 1, family=binomial(link="logit"),start=-1.81) print.noquote(c(fit1$coef,fit2$coef)) #======================================================================= # EXAMPLE 2: identity link Poisson (successful boundary convergence # using 4 identical approaches in glm and glm2 with the method argument) #======================================================================= satellites <- crabs$Satellites width.shifted <- crabs$Width - min(crabs$Width) dark <- crabs$Dark goodspine <- crabs$GoodSpine fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4)) fit1.eq <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), method = "glm.fit") fit2.eq <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), method = "glm.fit2") noquote(c("deviances: ",fit1$dev,fit2$dev,fit1.eq$dev,fit2.eq$dev)) noquote(c("converged: ",fit1$conv,fit2$conv,fit1.eq$conv,fit2.eq$conv)) noquote(c("boundary: ",fit1$bound,fit2$bound,fit1.eq$bound,fit2.eq$bound)) #=================================================================== # EXAMPLE 3: identity link Poisson (periodic non-convergence in glm) #=================================================================== R1 <- crabs$Rep1 satellites <- crabs$Satellites[R1] width.shifted <- crabs$Width[R1] - min(crabs$Width) dark <- crabs$Dark[R1] goodspine <- crabs$GoodSpine[R1] fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv)) #=============================================================== # EXAMPLE 4: log link binomial (periodic non-convergence in glm) #=============================================================== patients <- heart$Patients deaths <- heart$Deaths agegroup <- heart$AgeGroup severity <-heart$Severity delay <- heart$Delay region <- heart$Region start.p <- sum(deaths)/sum(patients) fit1 <- glm(cbind(deaths,patients-deaths) ~ factor(agegroup) + factor(severity) + factor(delay) + factor(region), family = binomial(link="log"), start = c(log(start.p), rep(0,8)), control = glm.control(trace=TRUE,maxit=100)) fit2 <- glm2(cbind(deaths,patients-deaths) ~ factor(agegroup) + factor(severity) + factor(delay) + factor(region), family = binomial(link="log"), start = c(log(start.p), rep(0,8)), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv)) #==================================================================== # EXAMPLE 5: identity link Poisson (aperiodic non-convergence in glm) #==================================================================== R2 <- crabs$Rep2 satellites <- crabs$Satellites[R2] width.shifted <- crabs$Width[R2] - min(crabs$Width) dark <- crabs$Dark[R2] goodspine <- crabs$GoodSpine[R2] fit1 <- glm(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) fit2 <- glm2(satellites ~ width.shifted + factor(dark) + factor(goodspine), family = poisson(link="identity"), start = rep(1,4), control = glm.control(trace=TRUE)) noquote(c("deviances: ",fit1$dev,fit2$dev)) noquote(c("converged: ",fit1$conv,fit2$conv))
This data set is a cross-tabulation of data on 16949 individuals who
experienced a heart attack (ASSENT-2 Investigators, 1999). There are 4
categorical factors each at 3 levels, together with the number of patients
and the number of deaths for each observed combination of the factors. This
data set is useful for illustrating the convergence properties of glm
and glm2.
A data frame with 74 observations on the following 6 variables.
number of deaths
number of patients
categorization of the age of the patients
severity of the heart attack
categorization of the time from heart attack to treatment
geographical region in which the patients were treated
The variable AgeGroup groups the age of the patients as follows: 1 =
<65 years, 2 = 65-75 years, 3 = >75 years. The variable Delay groups
the time from heart attack onset to treatment as follows: 1 = <2 hours, 2 =
2-4 hours, 3 = >4 hours. The variable Severity denotes the severity
of the heart attack using the Killip class: 1 = least severe (class I), 2 =
middle severity (class II), 3 = most severe (class III or IV). The variable
Region provides the geographical region in which the patients were
treated: 1 = Western countries, 2 = Latin America, 3 = Eastern Europe. This
data set is useful for illustrating the convergence properties of glm
and glm2; see examples in the help documentation for glm2.
ASSENT-2 Investigators. (1999) Single-bolus tenecteplase compared with front-loaded alteplase in acute myocardial infraction: The ASSENT-2 double-blind randomized trial. Lancet 354, 716-722.
This method is a placeholder. The influence
method for GLMs fit using glm2(..., method = "glm.fit2.Matrix")
is not yet implemented.
An error indicating the method is not implemented.
Implementation of predict.glm for GLMs fit
using glm2(..., method = "glm.fit2.Matrix"). See its documentation
for full details.
Implementation of summary.glm for GLMs fit
using glm2(..., method = "glm.fit2.Matrix"). See its documentation
for full details.
Implementation of vcov for GLMs fit
using glm2(..., method = "glm.fit2.Matrix"). See its documentation
for full details.