GAMsetup                package:mgcv                R Documentation

_S_e_t _u_p _G_A_M _u_s_i_n_g _p_e_n_a_l_i_z_e_d _c_u_b_i_c _r_e_g_r_e_s_s_i_o_n _s_p_l_i_n_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Sets up design matrix X, penalty matrices S_i and linear equality
     constraint matrix C for a GAM defined in terms of  penalized
     regression splines, as well as returning the locations of the
     knots of these  regression splines `xp[][]'. The output is such
     that the model can be fitted and  smoothing parameters estimated
     by the method of Wood (2000) as implemented in routine `mgcv()'.
     This routine is largely superceded by `gam'.

_U_s_a_g_e:

     GAMsetup(G)

_A_r_g_u_m_e_n_t_s:

       G: is the single argument to this function: it is a list
          containing several  elements listed below:

       m: the number of smooth terms in the model

       n: the number of data to be modelled

    nsdf: the number of user supplied columns of the design matrix for
          any parametric  model parts

      df: an array of `G$m' integers specifying the maximum d.f. for
          each spline  term.

     dim: An array of dimensions for the smooths. `dim[i]' is the
          number of covariates that smooth `i' is a function of.

  s.type: An array giving the type of basis used for each term. 0 for
          cubic regression spline, 1 for t.p.r.s 

 p.order: An array giving the order of the penalty for each term. 0 for
          auto selection.

       x: an array of `G$n' element arrays of data and (optionally)
          design matrix  columns. The first `G$nsdf' elements of `G$x'
          should contain the elements of  the columns of the design
          matrix corresponding to the parametric part of the model. The
           remaining `G$m' elements of `G$x' are the values of the
          covariates that are  arguments of the spline terms. Note that
          the smooths will be centred and no intercept term  will be
          added unless an array of 1's is supplied as part of in `G$x'

_V_a_l_u_e:

     A list `H', containing the elements of `G' (the input list) plus
     the  following:   

       X: the full design matrix.

       S: A one dimensional array containing the non-zero elements of
          the penalty matrices. Let
          `start[k+1]<-start[k]+H$df[1:(k-1)]^2' and `start[1]<-0'.
          Then penalty matrix `k' has `H$S[start[k]+i+H$df[i]*(j-1)' on
          its ith row and jth column. To get the kth full penalty
          matrix the matrix so obtained would be inserted into a full
          matrix of zeroes with it's 1,1 element at
          `H$off[k],H$off[k]'.  

     off: is an array of offsets, used to facilitate efficient storage
          of the penalty  matrices and to indicate where in the overall
          parameter vector the parameters of the ith  spline reside
          (e.g. first parameter of ith spline is at `p[off[i]+1]').

       C: a matrix defining the linear equality constraints on the
          parameters used to define the the model (i.e. C in Cp=0). 

      UZ: Array containing matrices, which transform from a t.p.r.s.
          basis to the equivalent t.p.s. basis (for t.p.r.s. terms
          only). The packing method is as follows: set `start[1]<-0'
          and `start[k+1]<-start[k]+(M[k]+n)*tp.bs[k]' where `n' is
          number of data, `M[k]' is penalty null space dimension and
          `tp.bs[k]' is zero for a cubic regression spline and the
          basis dimension for a t.p.r.s. Then element `i,j' of the UZ
          matrix for model term `k' is `UZ[start[k]+i+(j=1)*(M[k]+n)]'.

      Xu: Set of unique covariate combinations for each term.  The
          packing method is as follows: set `start[1]<-0' and
          `start[k+1]<-start[k]+(xu.length[k])*tp.dim[k]' where
          `xu.length[k]' is number of unique covariate combinations and
          `tp.dim[k]' is zero for a cubic regression spline and the
          dimension of the smooth (i.e. number of covariates it is a
          function of) for a t.p.r.s. Then element `i,j' of the Xu
          matrix for model term `k' is
          `Xu[start[k]+i+(j=1)*(xu.length[k])]'.

xu.length: Number of unique covariate combinations for each t.p.r.s.
          term.

covariate.shift: All covariates are centred around zero before bases
          are constructed - this is an array of the applied shifts.

      xp: matrix whose rows contain the covariate values corresponding
          to the  parameters  of each cubic regression spline - the
          cubic regression splines are parameterized using their y- 
          values at a series of x values - these vectors contain those
          x  values!

_A_u_t_h_o_r(_s):

     Simon N. Wood snw@st-and.ac.uk

_R_e_f_e_r_e_n_c_e_s:

     Wood, S.N. (2000) "Modelling and smoothing parameter estimation
     with multiple quadratic penalties" JRSSB 62(2):413-428

_S_e_e _A_l_s_o:

     `mgcv' `gam'

_E_x_a_m_p_l_e_s:

         # This example modified from routine SANtest()

         n<-100 # number of observations to simulate
         x <- runif(5 * n, 0, 1) # simulate covariates
         x <- array(x, dim = c(5, n)) # put into array for passing to GAMsetup
         pi <- asin(1) * 2  # begin simulating some data
         y <- 2 * sin(pi * x[2, ])
         y <- y + exp(2 * x[3, ]) - 3.75887
         y <- y + 0.2 * x[4, ]^11 * (10 * (1 - x[4, ]))^6 + 10 * (10 * 
             x[4, ])^3 * (1 - x[4, ])^10 - 1.396
         sig2<- -1    # set magnitude of variance 
         e <- rnorm(n, 0, sqrt(abs(sig2)))
         y <- y + e          # simulated data
         w <- matrix(1, n, 1) # weight matrix
         par(mfrow = c(2, 2)) # scatter plots of simulated data
         plot(x[2, ], y)
         plot(x[3, ], y)
         plot(x[4, ], y)
         plot(x[5, ], y)
         x[1,]<-1
         G <- list(m = 4, n = n, nsdf = 0, df = c(15, 15, 15, 15),dim=c(1,1,1,1),s.type=c(0,0,0,0), 
             p.order=c(0,0,0,0),x = x) # creat list for passing to GAMsetup
         H <- GAMsetup(G)
         H$y <- y    # add data to H
         H$sig2 <- sig2  # add variance (signalling GCV use in this case) to H
         H$w <- w       # add weights to H
         H$sp<-array(-1,H$m)
         H$fix<-array(FALSE,H$m)
         H$conv.tol<-1e-6;H$max.half<-15
         H <- mgcv(H)  # select smoothing parameters and fit model


