reshape                 package:base                 R Documentation

_R_e_s_h_a_p_e _g_r_o_u_p_e_d _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     This function reshapes a dataframe between `wide' format with
     repeated measurements in separate columns of the same record and
     `long' format with the repeated measurements in separate records.

_U_s_a_g_e:

     reshape(data, varying = NULL, v.names = NULL, timevar = "time", 
         idvar = "id", ids = 1:NROW(data),
         times = seq(length = length(varying[[1]])), 
         drop = NULL, direction, fix.row.names = TRUE,
         split=list(regexp="\.",include=FALSE)

_A_r_g_u_m_e_n_t_s:

    data: A data frame

 varying: Names of sets of variables in the wide format that correspond
          to single variables in long format (`time-varying'). A list
          of vectors (or optionally a matrix for `direction="wide"').
          See below for more details and options

 v.names: Names of variables in the long format that correspond to
          multiple variables in the wide format .

 timevar: The  variable in long format that differentiates multiple
          records from the same group/individual

   idvar: The  variable in long format that identifies multiple records
          from the same group/individual. This variable may also be
          present in wide format

     ids: The values to use for a newly created `idvar' variable in
          long format

   times: The values to use for a newly created `timevar' variable in
          long format

    drop: A vector of names of variables to drop before reshaping

direction: `"wide" to reshape to wide format, "long" to reshape to long
          format'

fix.row.names: if `TRUE' and `direction="wide"', create new row names
          in long format from the values of the id and time variables

   split: information for guessing the `varying', `v.names', and
          `times' arguments. See below for details

_D_e_t_a_i_l_s:

     The arguments to this function are described in terms of
     longitudinal data, as that is the application motivating the
     functions.  A `wide' longitudinal dataset will have one record for
     each individual with some time-constant variables that occupy
     single columns and some time-varying variables that occupy a
     column for each time point.  In `long' format there will be
     multiple records for each individual, with some variables being
     constant across these records and others varying across the
     records. A `long' format dataset also needs a `time' variable
     identifying which time point each record comes from and an `id'
     variable showing which records refer to the same person.

     If the data frame resulted from a previous `reshape' then the
     operation can be reversed by specifying just the `direction'
     argument. The other arguments are stored as attributes on the data
     frame.

     If `direction="long"' and no `varying' or `v.names' arguments are
     supplied it is assumed that all variables except `idvar' and
     `timevar' are time-varying. They are all expanded into multiple
     variables in wide format.

     If `direction="wide"' the `varying' argument can be a vector of
     column names or column numbers (converted to column names). The
     function will attempt to guess the `v.names' and `times' from
     these names.  The default is variable names like `x.1',
     `x.2',where `split=list(regexp="\.",include=FALSE)' to specifies
     to split at the dot and drop it from the name. To have alphabetic 
     followed by numeric times use
     `split=list(regexp="[A-Za-z][0-9]",include=TRUE)'. This splits
     between the alphabetic and numeric parts of the name and does not
     drop the regular expression.

_V_a_l_u_e:

     The reshaped data frame with added attributes to simplify
     reshaping back to the original form.

_S_e_e _A_l_s_o:

     `stack', `aperm'

_E_x_a_m_p_l_e_s:

     data(Indometh,package="nls")
     summary(Indometh)
     wide<-reshape(Indometh,v.names="conc",idvar="Subject",
                    timevar="time",direction="wide")
     wide

     reshape(wide, direction="long")
     reshape(wide, idvar="Subject",varying=list(names(wide)[2:12]),
               v.names="conc",direction="long")

     ## times need not be numeric
     df<-data.frame(id=rep(1:4,rep(2,4)),visit=I(rep(c("Before","After"),4)),
                   x=rnorm(4),y=runif(4))
     df
     reshape(df,timevar="visit",idvar="id",direction="wide")
     ## warns that y is really varying
     reshape(df,timevar="visit",idvar="id",direction="wide",v.names="x")  

     ##  unbalanced `long' data leads to NA fill in `wide' form
     df2<-df[1:7,]
     df2
     reshape(df2,timevar="visit",idvar="id",direction="wide")

     ## Alternative regular expressions for guessing names
     df3<-data.frame(id=1:4,age=c(40,50,60,50),dose1=c(1,2,1,2),
                         dose2=c(2,1,2,1),dose4=c(3,3,3,3))
     reshape(df3,direction="long",varying=3:5,
              split=list(regexp="[a-z][0-9]",include=TRUE))

     ## an example that isn't longitudinal data
     data(state)
     state.x77<-as.data.frame(state.x77)
     long<-reshape(state.x77,idvar="state",ids=row.names(state.x77),
            times=names(state.x77),timevar="Characteristic",
            varying=list(names(state.x77)),direction="long")

     reshape(long,direction="wide")

     reshape(long,direction="wide",new.row.names=unique(long$state))

