fanny                package:cluster                R Documentation

_F_u_z_z_y _A_n_a_l_y_s_i_s _C_l_u_s_t_e_r_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     Computes a fuzzy clustering of the data into `k' clusters.

_U_s_a_g_e:

     fanny(x, k, diss = inherits(x, "dist"), metric = "euclidean", stand = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: data matrix or data frame, or dissimilarity matrix, depending
          on the value of the `diss' argument.

          In case of a matrix or data frame, each row corresponds to an
          observation, and each column corresponds to a variable. All
          variables must be numeric. Missing values (NAs) are allowed.

          In case of a dissimilarity matrix, `x' is typically the
          output of `daisy' or `dist'.  Also a vector of length
          n*(n-1)/2 is allowed (where n is the number of observations),
          and will be interpreted in the same way as the output of the
          above-mentioned functions.  Missing values (NAs) are not
          allowed. 

       k: integer giving the desired number of clusters.  It is
          required that 0 < k < n/2 where n is the number of
          observations.

    diss: logical flag: if TRUE (default for `dist' or `dissimilarity'
          objects), then `x' is assumed to be a dissimilarity matrix. 
          If FALSE, then `x' is treated as a matrix of observations by
          variables. 

  metric: character string specifying the metric to be used for
          calculating dissimilarities between observations. The
          currently available options are "euclidean" and "manhattan".
          Euclidean distances are root sum-of-squares of differences,
          and manhattan distances are the sum of absolute differences.
          If `x' is already a dissimilarity matrix, then this argument
          will be ignored. 

   stand: logical; if true, the measurements in `x' are standardized
          before calculating the dissimilarities.  Measurements are
          standardized for each variable (column), by subtracting the
          variable's mean value and dividing by the variable's mean
          absolute deviation.  If `x' is already a dissimilarity
          matrix, then this argument will be ignored.

_D_e_t_a_i_l_s:

     In a fuzzy clustering, each observation is ``spread out'' over the
     various clusters. Denote by u(i,v) the membership of observation i
     to cluster v. The memberships are nonnegative, and for a fixed
     observation i they sum to 1. The particular method `fanny' stems
     from chapter 4 of Kaufman and Rousseeuw (1990).
     Compared to other fuzzy clustering methods, `fanny' has the
     following features: (a) it also accepts a dissimilarity matrix;
     (b) it is more robust to the `spherical cluster' assumption; (c)
     it provides a novel graphical display, the silhouette plot (see
     `plot.partition').

     Fanny aims to minimize the objective function

   SUM_v (SUM_(i,j) u(i,v)^2 u(j,v)^2 d(i,j)) / (2 SUM_j u(j,v)^2)

     where n is the number of observations, k is the number of clusters
     and d(i,j) is the dissimilarity between observations i and j.

_V_a_l_u_e:

     an object of class `"fanny"' representing the clustering. See
     `fanny.object' for details.

_S_e_e _A_l_s_o:

     `agnes' for background and references; `fanny.object',
     `partition.object', `plot.partition', `daisy', `dist'.

_E_x_a_m_p_l_e_s:

     ## generate 25 objects, divided into two clusters, and 3 objects lying
     ## between those clusters.
     x <- rbind(cbind(rnorm(10, 0, 0.5), rnorm(10, 0, 0.5)),
                cbind(rnorm(15, 5, 0.5), rnorm(15, 5, 0.5)),
                cbind(rnorm( 3,3.5,0.5), rnorm( 3,3.5,0.5)))
     fannyx <- fanny(x, 2)
     fannyx
     summary(fannyx)
     plot(fannyx)

     data(ruspini)
     ## Plot similar to Figure 6 in Stryuf et al (1996)
     plot(fanny(ruspini, 5))

