# Data Science Academy - R Fundamentos

## Esses são alguns exemplos simples do que pode ser feito com a linguagem R. O curso completo está disponível em:

``````

In [2]:

# Imprimir na tela
print('R - Uma das principais ferramentas do Cientista de Dados')

``````
``````

[1] "R - Uma das principais ferramentas do Cientista de Dados"

``````
``````

In [5]:

# Visualizando o diretório de trabalho
getwd()

``````
``````

Out[5]:

'/opt/DSA/RFundamentos/JupyterNotebooks'

``````
``````

In [28]:

# Usando o help
?rnorm

``````
``````

Out[28]:

Normal {stats}R Documentation

The Normal Distribution

Description

Density, distribution function, quantile function and random
generation for the normal distribution with mean equal to mean
and standard deviation equal to sd.

Usage

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Arguments

x, q

vector of quantiles.

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length
is taken to be the number required.

mean

vector of means.

sd

vector of standard deviations.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are
P[X ≤ x] otherwise, P[X > x].

Details

If mean or sd are not specified they assume the default
values of 0 and 1, respectively.

The normal distribution has density

f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))

where μ is the mean of the distribution and
σ the standard deviation.

Value

dnorm gives the density,
pnorm gives the distribution function,
qnorm gives the quantile function, and
rnorm generates random deviates.

The length of the result is determined by n for
rnorm, and is the maximum of the lengths of the
numerical arguments for the other functions.

The numerical arguments other than n are recycled to the
length of the result.  Only the first elements of the logical
arguments are used.

For sd = 0 this gives the limit as sd decreases to 0, a
point mass at mu.
sd < 0 is an error and returns NaN.

Source

For pnorm, based on

Cody, W. D. (1993)
Algorithm 715: SPECFUN – A portable FORTRAN package of special
function routines and test drivers.
ACM Transactions on Mathematical Software 19, 22–32.

For qnorm, the code is a C translation of

Wichura, M. J. (1988)
Algorithm AS 241: The percentage points of the normal distribution.
Applied Statistics, 37, 477–484.

which provides precise results up to about 16 digits.

For rnorm, see RNG for how to select the algorithm and
for references to the supplied methods.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995)
Continuous Univariate Distributions, volume 1, chapter 13.
Wiley, New York.

Distributions for other standard distributions, including
dlnorm for the Lognormal distribution.

Examples

require(graphics)

dnorm(0) == 1/sqrt(2*pi)
dnorm(1) == exp(-1/2)/sqrt(2*pi)
dnorm(1) == 1/sqrt(2*pi*exp(1))

## Using "log = TRUE" for an extended range :
par(mfrow = c(2,1))
plot(function(x) dnorm(x, log = TRUE), -60, 50,
main = "log { Normal density }")
curve(log(dnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("log(dnorm(x))", col = "red", adj = 1)

plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("log(pnorm(x))", col = "red", adj = 1)

## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)

[Package stats version 3.3.1 ]

``````
``````

In [6]:

# Usando o help
help(mean)

``````
``````

Out[6]:

mean {base}R Documentation

Arithmetic Mean

Description

Generic function for the (trimmed) arithmetic mean.

Usage

mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)

Arguments

x

An R object.  Currently there are methods for
numeric/logical vectors and date,
date-time and time interval objects.  Complex vectors
are allowed for trim = 0, only.

trim

the fraction (0 to 0.5) of observations to be
trimmed from each end of x before the mean is computed.
Values of trim outside that range are taken as the nearest endpoint.

na.rm

a logical value indicating whether NA
values should be stripped before the computation proceeds.

...

further arguments passed to or from other methods.

Value

If trim is zero (the default), the arithmetic mean of the
values in x is computed, as a numeric or complex vector of
length one.  If x is not logical (coerced to numeric), numeric
(including integer) or complex, NA_real_ is returned, with a warning.

If trim is non-zero, a symmetrically trimmed mean is computed
with a fraction of trim observations deleted from each end
before the mean is computed.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.

weighted.mean, mean.POSIXct,
colMeans for row and column means.

Examples

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))

[Package base version 3.3.1 ]

``````
``````

In [8]:

# Usando o help
help.search('randomForest')

``````
``````

In [3]:

# Criar gráficos
plot(1:30)

``````
``````

``````
``````

In [13]:

# Criando variável numérica
obj1 = 367
obj1

``````
``````

Out[13]:

367

``````
``````

In [14]:

# Verificando o tipo da variável
typeof(obj1)

``````
``````

Out[14]:

'double'

``````
``````

In [20]:

# Criando um vetor de strings
obj2 = c("segunda", "terça", "quarta")
obj2

``````
``````

Out[20]:

'segunda'
'terça'
'quarta'

``````
``````

In [21]:

# Imprimindo um elemento do vetor
obj2[3]

``````
``````

Out[21]:

'quarta'

``````
``````

In [27]:

matriz1 = matrix (c(100, 200, 300, 400), nr = 2, nc = 2,
dimnames = list(c("Linha 1", "Linha 2" ), c( "Coluna 1", " Coluna 2")))
matriz1

``````
``````

Out[27]:

Coluna 1 Coluna 2

Linha 1100300
Linha 2200400

``````
``````

In [25]:

# Criando uma lista
lista1 = list("J", 25, TRUE)
lista1

``````
``````

Out[25]:

'J'
25
TRUE

``````
``````

In [30]:

# Criando vetores
pais = c("China", "Portugal", "Noruega", "Egito", "Brasil")
nome = c("Panda", "Zebra", "Girafa", "Elefante", "Jacaré")
altura = c(1.78, 1.72, 1.63, 1.59, 1.63)
codigo = c(5001, 2183, 4702, 7965, 8890)

# Criando um dataframe de diversos vetores
pesquisa = data.frame(pais, nome, altura, codigo)
pesquisa

``````
``````

Out[30]:

paisnomealturacodigo

1ChinaPanda1.78 5001
2PortugalZebra   1.72    2183
3NoruegaGirafa 1.63   4702
4Egito   Elefante1.59    7965
5BrasilJacaré1.63  8890

``````