In [3]:

    
#  %load_ext rmagic
%load_ext rpy2.ipython









    



The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython



In [4]:

    
%%R

library(outliers)



In [5]:

    
%%R

grubbs.test









    





function (x, type = 10, opposite = FALSE, two.sided = FALSE) 
{
    if (sum(c(10, 11, 20) == type) == 0) 
        stop("Incorrect type")
    DNAME <- deparse(substitute(x))
    x <- sort(x[complete.cases(x)])
    n <- length(x)
    if (type == 11) {
        g <- (x[n] - x[1])/sd(x)
        u <- var(x[2:(n - 1)])/var(x) * (n - 3)/(n - 1)
        pval = 1 - pgrubbs(g, n, type = 11)
        method <- "Grubbs test for two opposite outliers"
        alt = paste(x[1], "and", x[n], "are outliers")
    }
    else if (type == 10) {
        if (xor(((x[n] - mean(x)) < (mean(x) - x[1])), opposite)) {
            alt = paste("lowest value", x[1], "is an outlier")
            o <- x[1]
            d <- x[2:n]
        }
        else {
            alt = paste("highest value", x[n], "is an outlier")
            o <- x[n]
            d <- x[1:(n - 1)]
        }
        g <- abs(o - mean(x))/sd(x)
        u <- var(d)/var(x) * (n - 2)/(n - 1)
        pval <- 1 - pgrubbs(g, n, type = 10)
        method <- "Grubbs test for one outlier"
    }
    else {
        if (xor(((x[n] - mean(x)) < (mean(x) - x[1])), opposite)) {
            alt = paste("lowest values", x[1], ",", x[2], "are outliers")
            u <- var(x[3:n])/var(x) * (n - 3)/(n - 1)
        }
        else {
            alt = paste("highest values", x[n - 1], ",", x[n], 
                "are outliers")
            u <- var(x[1:(n - 2)])/var(x) * (n - 3)/(n - 1)
        }
        g <- NULL
        pval <- pgrubbs(u, n, type = 20)
        method <- "Grubbs test for two outliers"
    }
    if (two.sided) {
        pval <- 2 * pval
        if (pval > 1) 
            pval <- 2 - pval
    }
    RVAL <- list(statistic = c(G = g, U = u), alternative = alt, 
        p.value = pval, method = method, data.name = DNAME)
    class(RVAL) <- "htest"
    return(RVAL)
}
<environment: namespace:outliers>



In [6]:

    
%%R

help(grubbs.test)









    



R Help on ‘grubbs.test’grubbs.test              package:outliers              R Documentation

_G_r_u_b_b_s _t_e_s_t_s _f_o_r _o_n_e _o_r _t_w_o _o_u_t_l_i_e_r_s _i_n _d_a_t_a _s_a_m_p_l_e

_D_e_s_c_r_i_p_t_i_o_n:

     Performs Grubbs' test for one outlier, two outliers on one tail,
     or two outliers on opposite tails, in small sample.

_U_s_a_g_e:

     grubbs.test(x, type = 10, opposite = FALSE, two.sided = FALSE)
     
_A_r_g_u_m_e_n_t_s:

       x: a numeric vector for data values.

opposite: a logical indicating whether you want to check not the value
          with largest difference from the mean, but opposite (lowest,
          if most suspicious is highest etc.)

    type: Integer value indicating test variant. 10 is a test for one
          outlier (side is detected automatically and can be reversed
          by ‘opposite’ parameter). 11 is a test for two outliers on
          opposite tails, 20 is test for two outliers in one tail.

two.sided: Logical value indicating if there is a need to treat this
          test as two-sided.

_D_e_t_a_i_l_s:

     The function can perform three tests given and discussed by Grubbs
     (1950).

     First test (10) is used to detect if the sample dataset contains
     one outlier, statistically different than the other values. Test
     is based by calculating score of this outlier G (outlier minus
     mean and divided by sd) and comparing it to appropriate critical
     values. Alternative method is calculating ratio of variances of
     two datasets - full dataset and dataset without outlier. The
     obtained value called U is bound with G by simple formula.

     Second test (11) is used to check if lowest and highest value are
     two outliers on opposite tails of sample. It is based on
     calculation of ratio of range to standard deviation of the sample.

     Third test (20) calculates ratio of variance of full sample and
     sample without two extreme observations. It is used to detect if
     dataset contains two outliers on the same tail.

     The p-values are calculated using ‘qgrubbs’ function.

_V_a_l_u_e:

statistic: the value statistic. For type 10 it is difference between
          outlier and the mean divided by standard deviation, and for
          type 20 it is sample range divided by standard deviation.
          Additional value U is ratio of sample variances with and
          withour suspicious outlier.  According to Grubbs (1950) these
          values for type 10 are bound by simple formula and only one
          of them can be used, but function gives both. For type 20 the
          G is the same as U.

 p.value: the p-value for the test.

alternative: a character string describing the alternative hypothesis.

  method: a character string indicating what type of test was
          performed.

data.name: name of the data argument.

_A_u_t_h_o_r(_s):

     Lukasz Komsta

_R_e_f_e_r_e_n_c_e_s:

     Grubbs, F.E. (1950). Sample Criteria for testing outlying
     observations. Ann. Math. Stat. 21, 1, 27-58.

_S_e_e _A_l_s_o:

     ‘dixon.test’, ‘chisq.out.test’

_E_x_a_m_p_l_e_s:

     set.seed(1234)
     x = rnorm(10)
     grubbs.test(x)
     grubbs.test(x,type=20)
     grubbs.test(x,type=11)



In [7]:

    
%%R 

set.seed(1234)
x = rnorm(10)
grubbs.test(x)









    





	Grubbs test for one outlier

data:  x
G = 1.9708, U = 0.5205, p-value = 0.1323
alternative hypothesis: lowest value -2.34569770262935 is an outlier



In [8]:

    
%%R
print(x)
plot(x)









    





 [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247  0.5060559
 [7] -0.5747400 -0.5466319 -0.5644520 -0.8900378

type : int in (10, 11, 20) 10: tests for one outlier (side doesn't matter, can be reversed w/ 'opposite' parameter) 11: tests for two outliers on opposite tails 20: tests for two outliers on the same tail



In [9]:

    
%%R

find_one_outlier <- function (x, opposite = FALSE) 
{
    DNAME <- deparse(substitute(x))
    x <- sort(x[complete.cases(x)])
    n <- length(x)

    if (xor(((x[n] - mean(x)) < (mean(x) - x[1])), opposite)) {
        alt = paste("lowest value", x[1], "is an outlier")
        o <- x[1]
        d <- x[2:n]
    }
    else {
        alt = paste("highest value", x[n], "is an outlier")
        o <- x[n]
        d <- x[1:(n - 1)]
    }
    g <- abs(o - mean(x))/sd(x)
    u <- var(d)/var(x) * (n - 2)/(n - 1)
    pval <- 1 - pgrubbs(g, n, type = 10)
    method <- "Grubbs test for one outlier"

    
    RVAL <- list(statistic = c(G = g, U = u), alternative = alt, 
        p.value = pval, method = method, data.name = DNAME,
        outlier_value = o, outlier_rowname = g)
    class(RVAL) <- "htest"
    return(RVAL)
}



In [10]:

    
%%R 

set.seed(1234)
x = rnorm(10)
print(x)
plot(x)









    





 [1] -1.2070657  0.2774292  1.0844412 -2.3456977  0.4291247  0.5060559
 [7] -0.5747400 -0.5466319 -0.5644520 -0.8900378



In [11]:

    
%%R
result <- find_one_outlier(x)
print(result$outlier_value)
print(result$outlier_rowname)









    





[1] -2.345698
[1] 1.970842



In [12]:

    
%%R

mpitable <- read.csv('potatoes.tsv', sep='\t')



In [25]:

    
%%R

plot(mpitable[,7])



In [21]:

    
%%R
find_one_outlier(mpitable[,3])









    





	Grubbs test for one outlier

data:  mpitable[, 3]
G = 2.9248, U = 0.9090, p-value = 0.1353
alternative hypothesis: lowest value 4.376582008 is an outlier