Datasets


In [1]:
df <- read.csv('desemprego.csv')

In [2]:
str(df)


'data.frame':	121 obs. of  2 variables:
 $ Periodo   : num  2007 2007 2007 2007 2007 ...
 $ Desemprego: num  10.4 10.5 10 10 9.3 9.3 9.1 9.6 9.8 9.8 ...

In [3]:
summary(df)


    Periodo       Desemprego   
 Min.   :2007   Min.   : 6.90  
 1st Qu.:2010   1st Qu.: 8.50  
 Median :2012   Median : 9.40  
 Mean   :2012   Mean   :10.01  
 3rd Qu.:2015   3rd Qu.:10.70  
 Max.   :2017   Max.   :15.90  

In [4]:
head(df)


PeriodoDesemprego
2007.0810.4
2007.0910.5
2007.1010.0
2007.1110.0
2007.12 9.3
2008.01 9.3

In [5]:
tail(df)


PeriodoDesemprego
1162017.0315.2
1172017.0415.5
1182017.0515.9
1192017.0615.6
1202017.0715.1
1212017.0814.5

Datasets não convencionais


In [7]:
df2 <- read.csv('dataset-nao-convencional.csv',dec = ',', sep = ";")

In [11]:
str(df2)


'data.frame':	29 obs. of  3 variables:
 $ Nome   : Factor w/ 29 levels "Antônia","Antônio",..: 12 4 8 14 23 22 18 13 21 9 ...
 $ Pesos  : int  74 61 61 68 70 73 67 67 65 57 ...
 $ Alturas: num  1.73 1.61 1.61 1.67 1.69 1.75 1.67 1.67 1.63 1.57 ...

In [8]:
head(df2)


NomePesosAlturas
Fulano 74 1.73
Beltrano61 1.61
Cicrano 61 1.61
João 68 1.67
Pedro 70 1.69
Paulo 73 1.75

In [9]:
df3 <- read.csv('dataset-nao-convencional.csv',dec = ',', sep = ";", 
      colClasses= c('character','integer','numeric'))

In [10]:
str(df3)


'data.frame':	29 obs. of  3 variables:
 $ Nome   : chr  "Fulano" "Beltrano" "Cicrano" "João" ...
 $ Pesos  : int  74 61 61 68 70 73 67 67 65 57 ...
 $ Alturas: num  1.73 1.61 1.61 1.67 1.69 1.75 1.67 1.67 1.63 1.57 ...

In [12]:
df4 <- read.csv('datas.csv',colClasses=c('Date','integer', 'numeric'))

In [13]:
print(as.numeric(format(df4$Nascimento, "%m")))


[1] 3 1 5 1 2

In [14]:
df6 <- read.csv('datas-horas.csv', colClasses = c('character', 'numeric', 'numeric'))
str(df6)


'data.frame':	3 obs. of  3 variables:
 $ Nascimento: chr  "5-3-2005 17:35:00" "1-5-2005 12:10:05" "10-6-2005 23:01:12"
 $ Pesos     : num  5 3.5 4.1
 $ Alturas   : num  0.5 0.48 0.51

In [15]:
df6$Nascimento <- strptime(df6$Nascimento,format='%d-%m-%Y %H:%M:%S')
str(df6)


'data.frame':	3 obs. of  3 variables:
 $ Nascimento: POSIXlt, format: "2005-03-05 17:35:00" "2005-05-01 12:10:05" ...
 $ Pesos     : num  5 3.5 4.1
 $ Alturas   : num  0.5 0.48 0.51

In [16]:
valores <- c(1.0,5.2,,6.7,3.2,4.1)


Error in c(1, 5.2, , 6.7, 3.2, 4.1): argumento 3 está vazio
Traceback:

In [18]:
valores <- c(1.0,5.2,NA,6.7,3.2,4.1)
valores


  1. 1
  2. 5.2
  3. NA
  4. 6.7
  5. 3.2
  6. 4.1

In [20]:
valores <- c(1.0,5.2,NaN,6.7,3.2,4.1)
valores
mean(valores)


  1. 1
  2. 5.2
  3. NaN
  4. 6.7
  5. 3.2
  6. 4.1
NaN

In [21]:
valores <- c(1.0,5.2,NULL,6.7,3.2,4.1)
valores
mean(valores)
length(valores)


  1. 1
  2. 5.2
  3. 6.7
  4. 3.2
  5. 4.1
4.04
5

In [23]:
valores <- c(1.0,5.2,NA,6.7,3.2,4.1)
is.na(valores)
is.na(valores[3])


  1. FALSE
  2. FALSE
  3. TRUE
  4. FALSE
  5. FALSE
  6. FALSE
TRUE

In [24]:
valores <- c(1.0,5.2,NaN,6.7,3.2,4.1)
is.na(valores)


  1. FALSE
  2. FALSE
  3. TRUE
  4. FALSE
  5. FALSE
  6. FALSE

In [26]:
# Inf é numérico! 

valores <- c(1.0,5.2,Inf,6.7,3.2,4.1)
is.na(valores)
is.infinite(valores)


  1. FALSE
  2. FALSE
  3. FALSE
  4. FALSE
  5. FALSE
  6. FALSE
  1. FALSE
  2. FALSE
  3. TRUE
  4. FALSE
  5. FALSE
  6. FALSE

In [27]:
valores <- c(1.0,5.2,NULL,6.7,3.2,4.1)
is.na(valores)


  1. FALSE
  2. FALSE
  3. FALSE
  4. FALSE
  5. FALSE

In [28]:
library(tidyr)

In [29]:
library(dplyr)


Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


In [ ]: