findInterval              package:base              R Documentation

_F_i_n_d _I_n_t_e_r_v_a_l _N_u_m_b_e_r_s _o_r _I_n_d_i_c_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Find the indices of `x' in `vec', where `vec' must be sorted
     (non-decreasingly); i.e., if `i <- findInterval(x,v)', we have
     v[i[j]] <= x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := +
     Inf, and `N <- length(vec)'. At the two boundaries, the returned
     index may differ by 1, depending on the optional arguments
     `rightmost.closed' and `all.inside'.

_U_s_a_g_e:

     findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE)

_A_r_g_u_m_e_n_t_s:

       x: numeric.

     vec: numeric, sorted (weakly) increasingly, of length `N', say.

rightmost.closed: logical; if true, the rightmost interval, `vec[N-1]
          .. vec[N]' is treated as closed, see below.

all.inside: logical; if true, the returned indices are coerced into
          {1,...,N-1}, i.e. 0 is mapped to 1 and N to N-1.

_D_e_t_a_i_l_s:

     The function `findInterval' finds the index of one vector `x' in
     another, `vec', where the latter must be non-decreasing.  Where
     this is trivial, equivalent to `apply( outer(x, vec, ">="), 1,
     sum)', as a matter of fact, the internal algorithm uses interval
     search ensuring O(n * log(N)) complexity where `n <- length(x)'
     (and `N <- length(vec)').  For (almost) sorted `x', it will be
     even faster, basically O(n).

     This is the same computation as for the empirical distribution
     function, and indeed, `findInterval(t, sort(X))' is identical to n
     * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution
     function of X[1],..,X[n].

     When `rightmost.closed = TRUE', the result for `x[j] = vec[N]' ( =
     max(vec)), is `N - 1' as for all other values in the last
     interval.

_V_a_l_u_e:

     vector of length `length(x)' with values in `0:N' where `N <-
     length(vec)', or values coerced to `1:(N-1)' iff `all.inside =
     TRUE' (equivalently coercing all x values inside the intervals).

_A_u_t_h_o_r(_s):

     Martin Maechler

_S_e_e _A_l_s_o:

     `approx(*, method = "constant")' which is a generalization of
     `findInterval()', `ecdf' for computing the empirical distribution
     function which is (up to a factor of n) also basically the same as
     findInterval(.).

_E_x_a_m_p_l_e_s:

     N <- 100
     X <- sort(round(rt(N, df=2), 2))
     tt <- c(-100, seq(-2,2, len=201), +100)
     it <- findInterval(tt, X)
     tt[it < 1 | it >= N] # only first and last are outside range(X)

     ## See that this is N * Fn(.) :
     tt <- c(tt,X)
     eps <- 100 * .Machine$double.eps
     require(stepfun)
     stopifnot( it[c(1,203)] == c(0, 100),
               all.equal(N * ecdf(X)(tt),
                         findInterval(tt, X),  tol = eps),
               findInterval(tt,X) ==  apply( outer(tt, X, ">="), 1, sum)
      )

