Data Science Academy - R Fundamentos

Download: http://github.com/dsacademybr

Esses são alguns exemplos simples do que pode ser feito com a linguagem R. O curso completo está disponível em:

http://www.datascienceacademy.com.br/pages/curso-r-fundamentos-para-analise-de-dados


In [2]:
# Imprimir na tela
print('R - Uma das principais ferramentas do Cientista de Dados')


[1] "R - Uma das principais ferramentas do Cientista de Dados"

In [5]:
# Visualizando o diretório de trabalho
getwd()


Out[5]:
'/opt/DSA/RFundamentos/JupyterNotebooks'

In [28]:
# Usando o help
?rnorm


Out[28]:
Normal {stats}R Documentation

The Normal Distribution

Description

Density, distribution function, quantile function and random generation for the normal distribution with mean equal to mean and standard deviation equal to sd.

Usage

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Arguments

x, q

vector of quantiles.

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required.

mean

vector of means.

sd

vector of standard deviations.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

Details

If mean or sd are not specified they assume the default values of 0 and 1, respectively.

The normal distribution has density

f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))

where μ is the mean of the distribution and σ the standard deviation.

Value

dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function, and rnorm generates random deviates.

The length of the result is determined by n for rnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

For sd = 0 this gives the limit as sd decreases to 0, a point mass at mu. sd < 0 is an error and returns NaN.

Source

For pnorm, based on

Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package of special function routines and test drivers. ACM Transactions on Mathematical Software 19, 22–32.

For qnorm, the code is a C translation of

Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.

which provides precise results up to about 16 digits.

For rnorm, see RNG for how to select the algorithm and for references to the supplied methods.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 13. Wiley, New York.

See Also

Distributions for other standard distributions, including dlnorm for the Lognormal distribution.

Examples

require(graphics)

dnorm(0) == 1/sqrt(2*pi)
dnorm(1) == exp(-1/2)/sqrt(2*pi)
dnorm(1) == 1/sqrt(2*pi*exp(1))

## Using "log = TRUE" for an extended range :
par(mfrow = c(2,1))
plot(function(x) dnorm(x, log = TRUE), -60, 50,
     main = "log { Normal density }")
curve(log(dnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("dnorm(x, log=TRUE)", adj = 0)
mtext("log(dnorm(x))", col = "red", adj = 1)

plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
     main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("pnorm(x, log=TRUE)", adj = 0)
mtext("log(pnorm(x))", col = "red", adj = 1)

## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)

[Package stats version 3.3.1 ]

In [6]:
# Usando o help
help(mean)


Out[6]:
mean {base}R Documentation

Arithmetic Mean

Description

Generic function for the (trimmed) arithmetic mean.

Usage

mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)

Arguments

x

An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for trim = 0, only.

trim

the fraction (0 to 0.5) of observations to be trimmed from each end of x before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.

na.rm

a logical value indicating whether NA values should be stripped before the computation proceeds.

...

further arguments passed to or from other methods.

Value

If trim is zero (the default), the arithmetic mean of the values in x is computed, as a numeric or complex vector of length one. If x is not logical (coerced to numeric), numeric (including integer) or complex, NA_real_ is returned, with a warning.

If trim is non-zero, a symmetrically trimmed mean is computed with a fraction of trim observations deleted from each end before the mean is computed.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

See Also

weighted.mean, mean.POSIXct, colMeans for row and column means.

Examples

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

[Package base version 3.3.1 ]

In [8]:
# Usando o help
help.search('randomForest')

In [3]:
# Criar gráficos
plot(1:30)



In [13]:
# Criando variável numérica
obj1 = 367
obj1


Out[13]:
367

In [14]:
# Verificando o tipo da variável
typeof(obj1)


Out[14]:
'double'

In [20]:
# Criando um vetor de strings
obj2 = c("segunda", "terça", "quarta")
obj2


Out[20]:
  1. 'segunda'
  2. 'terça'
  3. 'quarta'

In [21]:
# Imprimindo um elemento do vetor
obj2[3]


Out[21]:
'quarta'

In [27]:
# Construindo uma matriz nomeada
matriz1 = matrix (c(100, 200, 300, 400), nr = 2, nc = 2, 
                  dimnames = list(c("Linha 1", "Linha 2" ), c( "Coluna 1", " Coluna 2")))
matriz1


Out[27]:
Coluna 1 Coluna 2
Linha 1100300
Linha 2200400

In [25]:
# Criando uma lista
lista1 = list("J", 25, TRUE)
lista1


Out[25]:
  1. 'J'
  2. 25
  3. TRUE

In [30]:
# Criando vetores
pais = c("China", "Portugal", "Noruega", "Egito", "Brasil")
nome = c("Panda", "Zebra", "Girafa", "Elefante", "Jacaré")
altura = c(1.78, 1.72, 1.63, 1.59, 1.63)
codigo = c(5001, 2183, 4702, 7965, 8890)


# Criando um dataframe de diversos vetores
pesquisa = data.frame(pais, nome, altura, codigo)
pesquisa


Out[30]:
paisnomealturacodigo
1ChinaPanda1.78 5001
2PortugalZebra 1.72 2183
3NoruegaGirafa 1.63 4702
4Egito Elefante1.59 7965
5BrasilJacaré1.63 8890

Fim

Obrigado - Data Science Academy - facebook.com/dsacademybr