Data Science Academy - R Fundamentos

Download: http://github.com/dsacademybr

Esses são alguns exemplos simples do que pode ser feito com a linguagem R. O curso completo está disponível em:

http://www.datascienceacademy.com.br/pages/curso-r-fundamentos-para-analise-de-dados



In [2]:

    
# Imprimir na tela
print('R - Uma das principais ferramentas do Cientista de Dados')









    



[1] "R - Uma das principais ferramentas do Cientista de Dados"



In [5]:

    
# Visualizando o diretório de trabalho
getwd()









    Out[5]:




'/opt/DSA/RFundamentos/JupyterNotebooks'



In [28]:

    
# Usando o help
?rnorm









    Out[28]:





Normal {stats} R Documentation

The Normal Distribution

Description

Density, distribution function, quantile function and random
generation for the normal distribution with mean equal to mean
and standard deviation equal to sd.



Usage

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)



Arguments


x, q

vector of quantiles.

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length
is taken to be the number required.

mean

vector of means.

sd

vector of standard deviations.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are
P[X ≤ x] otherwise, P[X > x].




Details

If mean or sd are not specified they assume the default
values of 0 and 1, respectively.

The normal distribution has density


    f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))
  

where μ is the mean of the distribution and
σ the standard deviation.



Value

dnorm gives the density,
pnorm gives the distribution function,
qnorm gives the quantile function, and
rnorm generates random deviates.

The length of the result is determined by n for
rnorm, and is the maximum of the lengths of the
numerical arguments for the other functions.  

The numerical arguments other than n are recycled to the
length of the result.  Only the first elements of the logical
arguments are used.

For sd = 0 this gives the limit as sd decreases to 0, a
point mass at mu.
sd < 0 is an error and returns NaN.



Source

For pnorm, based on

Cody, W. D. (1993)
Algorithm 715: SPECFUN – A portable FORTRAN package of special
function routines and test drivers.
ACM Transactions on Mathematical Software 19, 22–32.

For qnorm, the code is a C translation of

Wichura, M. J. (1988)
Algorithm AS 241: The percentage points of the normal distribution.
Applied Statistics, 37, 477–484.

which provides precise results up to about 16 digits.

For rnorm, see RNG for how to select the algorithm and
for references to the supplied methods.



References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995)
Continuous Univariate Distributions, volume 1, chapter 13.
Wiley, New York.



See Also

Distributions for other standard distributions, including
dlnorm for the Lognormal distribution.



Examples

require(graphics)

dnorm(0) == 1/sqrt(2*pi)
dnorm(1) == exp(-1/2)/sqrt(2*pi)
dnorm(1) == 1/sqrt(2*pi*exp(1))

## Using "log = TRUE" for an extended range :
par(mfrow = c(2,1))
plot(function(x) dnorm(x, log = TRUE), -60, 50,
     main = "log { Normal density }")
curve(log(dnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("dnorm(x, log=TRUE)", adj = 0)
mtext("log(dnorm(x))", col = "red", adj = 1)

plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
     main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("pnorm(x, log=TRUE)", adj = 0)
mtext("log(pnorm(x))", col = "red", adj = 1)

## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)


[Package stats version 3.3.1 ]



In [6]:

    
# Usando o help
help(mean)









    Out[6]:





mean {base} R Documentation

Arithmetic Mean

Description

Generic function for the (trimmed) arithmetic mean.



Usage

mean(x, ...)

## Default S3 method:
mean(x, trim = 0, na.rm = FALSE, ...)



Arguments


x

An R object.  Currently there are methods for
numeric/logical vectors and date,
date-time and time interval objects.  Complex vectors
are allowed for trim = 0, only.

trim

the fraction (0 to 0.5) of observations to be
trimmed from each end of x before the mean is computed.
Values of trim outside that range are taken as the nearest endpoint.


na.rm

a logical value indicating whether NA
values should be stripped before the computation proceeds.

...

further arguments passed to or from other methods.




Value

If trim is zero (the default), the arithmetic mean of the
values in x is computed, as a numeric or complex vector of
length one.  If x is not logical (coerced to numeric), numeric
(including integer) or complex, NA_real_ is returned, with a warning.

If trim is non-zero, a symmetrically trimmed mean is computed
with a fraction of trim observations deleted from each end
before the mean is computed.



References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988)
The New S Language.
Wadsworth & Brooks/Cole.



See Also

weighted.mean, mean.POSIXct,
colMeans for row and column means.



Examples

x <- c(0:10, 50)
xm <- mean(x)
c(xm, mean(x, trim = 0.10))


[Package base version 3.3.1 ]



In [8]:

    
# Usando o help
help.search('randomForest')



In [3]:

    
# Criar gráficos
plot(1:30)



In [13]:

    
# Criando variável numérica
obj1 = 367
obj1









    Out[13]:




367



In [14]:

    
# Verificando o tipo da variável
typeof(obj1)









    Out[14]:




'double'



In [20]:

    
# Criando um vetor de strings
obj2 = c("segunda", "terça", "quarta")
obj2









    Out[20]:





	'segunda'
	'terça'
	'quarta'



In [21]:

    
# Imprimindo um elemento do vetor
obj2[3]









    Out[21]:




'quarta'



In [27]:

    
# Construindo uma matriz nomeada
matriz1 = matrix (c(100, 200, 300, 400), nr = 2, nc = 2, 
                  dimnames = list(c("Linha 1", "Linha 2" ), c( "Coluna 1", " Coluna 2")))
matriz1









    Out[27]:





Coluna 1  Coluna 2

	Linha 1 100 300
	Linha 2 200 400



In [25]:

    
# Criando uma lista
lista1 = list("J", 25, TRUE)
lista1









    Out[25]:





	'J'
	25
	TRUE



In [30]:

    
# Criando vetores
pais = c("China", "Portugal", "Noruega", "Egito", "Brasil")
nome = c("Panda", "Zebra", "Girafa", "Elefante", "Jacaré")
altura = c(1.78, 1.72, 1.63, 1.59, 1.63)
codigo = c(5001, 2183, 4702, 7965, 8890)


# Criando um dataframe de diversos vetores
pesquisa = data.frame(pais, nome, altura, codigo)
pesquisa









    Out[30]:





pais nome altura codigo

	1 China Panda 1.78 5001 
	2 Portugal Zebra   1.72    2183    
	3 Noruega Girafa 1.63   4702   
	4 Egito   Elefante 1.59    7965    
	5 Brasil Jacaré 1.63  8890

`x, q`	vector of quantiles.
`p`	vector of probabilities.
`n`	number of observations. If `length(n) > 1`, the length is taken to be the number required.
`mean`	vector of means.
`sd`	vector of standard deviations.
`log, log.p`	logical; if TRUE, probabilities p are given as log(p).
`lower.tail`	logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

`x`	An R object. Currently there are methods for numeric/logical vectors and date, date-time and time interval objects. Complex vectors are allowed for `trim = 0`, only.
`trim`	the fraction (0 to 0.5) of observations to be trimmed from each end of `x` before the mean is computed. Values of trim outside that range are taken as the nearest endpoint.
`na.rm`	a logical value indicating whether `NA` values should be stripped before the computation proceeds.
`...`	further arguments passed to or from other methods.

Data Science Academy - R Fundamentos

Download: http://github.com/dsacademybr

Esses são alguns exemplos simples do que pode ser feito com a linguagem R. O curso completo está disponível em:

http://www.datascienceacademy.com.br/pages/curso-r-fundamentos-para-analise-de-dados

The Normal Distribution

Description

Usage

Arguments

Details

Value

Source

References

See Also

Examples

Arithmetic Mean

Description

Usage

Arguments

Value

References

See Also

Examples

Fim

Obrigado - Data Science Academy - facebook.com/dsacademybr

	pais	nome	altura	codigo
1	China	Panda	1.78	5001
2	Portugal	Zebra	1.72	2183
3	Noruega	Girafa	1.63	4702
4	Egito	Elefante	1.59	7965
5	Brasil	Jacaré	1.63	8890