stepAIC                 package:MASS                 R Documentation

_C_h_o_o_s_e _a _m_o_d_e_l _b_y _A_I_C _i_n _a _S_t_e_p_w_i_s_e _A_l_g_o_r_i_t_h_m

_D_e_s_c_r_i_p_t_i_o_n:

     Performs stepwise model selection by exact AIC.

_U_s_a_g_e:

     stepAIC(object, scope, scale = 0,
             direction = c("both", "backward", "forward"),
             trace = 1, keep = NULL, steps = 1000, use.start = FALSE, k = 2, ...)

_A_r_g_u_m_e_n_t_s:

object fit: an object representing a model of an appropriate class.
          This is used as the initial model in the stepwise search. 

   scope: defines the range of models examined in the stepwise search.
          This should be either a single formula, or a list containing
          components `upper' and `lower', both formulae.  See the
          details for how to specify the formulae and how they are
          used. 

   scale: used in the definition of the AIC statistic for selecting the
          models, currently only for `lm', `aov' and `glm' models. 

direction: the mode of stepwise search, can be one of `"both"',
          `"backward"', or `"forward"', with a default of `"both"'.  If
          the `scope' argument is missing, the default for `direction'
          is `"backward"'. 

   trace: if positive, information is printed during the running of
          `stepAIC()'. Larger values may give more information on the
          fitting process. 

    keep: a filter function whose input is a fitted model object and
          the associated `AIC' statistic, and whose output is
          arbitrary. Typically `keep' will select a subset of the
          components of the object and return them. The default is not
          to keep anything. 

   steps: the maximum number of steps to be considered.  The default is
          1000 (essentially as many as required).  It is typically used
          to stop the process early. 

use.start: if true the updated fits are done starting at the linear
          predictor for the currently selected model. This may speed up
          the iterative calculations for `glm' (and other fits), but it
          can also slow them down. 

       k: the multiple of the number of degrees of freedom used for the
          penalty. Only `k = 2' gives the genuine AIC: `k = log(n)' is
          sometimes referred to as BIC or SBC. 

     ...: any additional arguments to `extractAIC'. (None are currently
          used.) 

_D_e_t_a_i_l_s:

     The set of models searched is determined by the `scope' argument.
     The right-hand-side of its `lower' component is always included in
     the model, and right-hand-side of the model is included in the
     `upper' component.  If `scope' is a single formula, it specifes
     the `upper' component, and the `lower' model is empty.  If `scope'
     is missing, the initial model is used as the `upper' model.

     There is a potential problem in using `glm' fits with a variable
     `scale', as in that case the deviance is not simply related to the
     maximized log-likelihood. The function `extractAIC.glm' makes the
     appropriate adjustment for a `gaussian' family, but may need to be
     amended for other cases. (The `binomial' and `poisson' families
     have fixed `scale' by default and do not correspond to a
     particular maximum-likelihood problem for variable `scale'.)

     Where a conventional deviance exists (e.g. for `lm', `aov' and
     `glm' fits) this is quoted in the analysis of variance table: it
     is the unscaled deviance.

_V_a_l_u_e:

     the stepwise-selected model is returned, with up to two additional
     components.  There is an `"anova"' component corresponding to the
     steps taken in the search, as well as a `"keep"' component if the
     `keep=' argument was supplied in the call. The `"Resid. Dev"'
     column of the analysis of deviance table refers to a constant
     minus twice the maximized log likelihood: it will be a deviance
     only in cases where a saturated model is well-defined (thus
     excluding `lm', `aov' and `survreg' fits, for example).

_N_o_t_e:

     The model fitting must apply the models to the same dataset. This
     may be a problem if there are missing values and an `na.action'
     other than `na.fail' is used (as is the default in R).  We suggest
     you remove the missing values first.

_S_e_e _A_l_s_o:

     `addterm', `dropterm', `step'

_E_x_a_m_p_l_e_s:

     data(quine)
     quine.hi <- aov(log(Days + 2.5) ~ .^4, quine)
     quine.nxt <- update(quine.hi, . ~ . - Eth:Sex:Age:Lrn)
     quine.stp <- stepAIC(quine.nxt,
         scope = list(upper = ~Eth*Sex*Age*Lrn, lower = ~1),
         trace = FALSE)
     quine.stp$anova

     data(cpus)
     cpus1 <- cpus
     attach(cpus)
     for(v in names(cpus)[2:7])
       cpus1[[v]] <- cut(cpus[[v]], unique(quantile(cpus[[v]])),
                         include.lowest = TRUE)
     detach()
     cpus0 <- cpus1[, 2:8]  # excludes names, authors' predictions
     cpus.samp <- sample(1:209, 100)
     cpus.lm <- lm(log10(perf) ~ ., data = cpus1[cpus.samp,2:8])
     cpus.lm2 <- stepAIC(cpus.lm, trace = FALSE)
     cpus.lm2$anova

     example(birthwt)
     birthwt.glm <- glm(low ~ ., family = binomial, data = bwt)
     birthwt.step <- stepAIC(birthwt.glm, trace = FALSE)
     birthwt.step$anova
     birthwt.step2 <- stepAIC(birthwt.glm, ~ .^2 + I(scale(age)^2)
         + I(scale(lwt)^2), trace = FALSE)
     birthwt.step2$anova

     quine.nb <- glm.nb(Days ~ .^4, data = quine)
     quine.nb2 <- stepAIC(quine.nb)
     quine.nb2$anova

