In [7]:
# Cài đặt `readr` package 
devtools::install_github("tidyverse/readr")


Downloading GitHub repo tidyverse/readr@master
from URL https://api.github.com/repos/tidyverse/readr/zipball/master
Installing readr
Installing BH
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
  CMD INSTALL '/tmp/Rtmpwi2iUG/devtools2cce121ca058/BH'  \
  --library='/home/duyetdev/R/x86_64-pc-linux-gnu-library/3.3'  \
  --install-tests 

Installing R6
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
  CMD INSTALL '/tmp/Rtmpwi2iUG/devtools2cce75f672ce/R6'  \
  --library='/home/duyetdev/R/x86_64-pc-linux-gnu-library/3.3'  \
  --install-tests 

Installing Rcpp
'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
  CMD INSTALL '/tmp/Rtmpwi2iUG/devtools2cce21e72fd4/Rcpp'  \
  --library='/home/duyetdev/R/x86_64-pc-linux-gnu-library/3.3'  \
  --install-tests 

'/usr/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore --quiet  \
  CMD INSTALL '/tmp/Rtmpwi2iUG/devtools2cce723a220d/tidyverse-readr-70b2a3c'  \
  --library='/home/duyetdev/R/x86_64-pc-linux-gnu-library/3.3'  \
  --install-tests 


In [11]:
# Load the readr package
library(readr)

read_csv

Đọc file csv


In [13]:
# Import potatoes.csv with read_csv(): potatoes
potatoes <- read_csv('potatoes.csv')


Parsed with column specification:
cols(
  area = col_integer(),
  temp = col_integer(),
  size = col_integer(),
  storage = col_integer(),
  method = col_integer(),
  texture = col_double(),
  flavor = col_double(),
  moistness = col_double()
)

In [14]:
head(potatoes)


areatempsizestoragemethodtextureflavormoistness
1 1 1 1 1 2.93.23.0
1 1 1 1 2 2.32.52.6
1 1 1 1 3 2.52.82.8
1 1 1 1 4 2.12.92.4
1 1 1 1 5 1.92.82.2
1 1 1 2 1 1.83.01.7

read_tsv

Đọc file tsv, các cột phân cách nhau bằng dấu tab \t

Đọc file potatoes.txt có nội dung như sau:

1   1   1   1   1   2.9 3.2 3.0
1   1   1   1   2   2.3 2.5 2.6
1   1   1   1   3   2.5 2.8 2.8
1   1   1   1   4   2.1 2.9 2.4
1   1   1   1   5   1.9 2.8 2.2
1   1   1   2   1   1.8 3.0 1.7
1   1   1   2   2   2.6 3.1 2.4
...

In [15]:
# Column names
properties <- c("area", "temp", "size", "storage", "method",
                "texture", "flavor", "moistness")
# read_tsv
potatoes <- read_tsv('potatoes.txt', col_names=properties)

# head of dataframe
head(potatoes)


Parsed with column specification:
cols(
  area = col_integer(),
  temp = col_integer(),
  size = col_integer(),
  storage = col_integer(),
  method = col_integer(),
  texture = col_double(),
  flavor = col_double(),
  moistness = col_double()
)
areatempsizestoragemethodtextureflavormoistness
1 1 1 1 1 2.93.23.0
1 1 1 1 2 2.32.52.6
1 1 1 1 3 2.52.82.8
1 1 1 1 4 2.12.92.4
1 1 1 1 5 1.92.82.2
1 1 1 2 1 1.83.01.7

read_delim()

Tương tự read.delim() hay read.table() của utils package.


In [16]:
properties <- c("area", "temp", "size", "storage", "method",
                "texture", "flavor", "moistness")
potatoes <- read_delim('potatoes.txt', col_names=properties, delim='\t')

# head of dataframe 
head(potatoes)


Parsed with column specification:
cols(
  area = col_integer(),
  temp = col_integer(),
  size = col_integer(),
  storage = col_integer(),
  method = col_integer(),
  texture = col_double(),
  flavor = col_double(),
  moistness = col_double()
)
areatempsizestoragemethodtextureflavormoistness
1 1 1 1 1 2.93.23.0
1 1 1 1 2 2.32.52.6
1 1 1 1 3 2.52.82.8
1 1 1 1 4 2.12.92.4
1 1 1 1 5 1.92.82.2
1 1 1 2 1 1.83.01.7

skip and n_max

skip bỏ qua n dòng đầu tiên. n_max khai báo số dòng tối đa.


In [17]:
# Lấy 10 dòng, kể từ dòng thứ 5
potatoes_fragment <- read_tsv("potatoes.txt", skip = 5, n_max = 10, col_names = properties)

potatoes_fragment


Parsed with column specification:
cols(
  area = col_integer(),
  temp = col_integer(),
  size = col_integer(),
  storage = col_integer(),
  method = col_integer(),
  texture = col_double(),
  flavor = col_double(),
  moistness = col_double()
)
areatempsizestoragemethodtextureflavormoistness
1 1 1 2 1 1.83.01.7
1 1 1 2 2 2.63.12.4
1 1 1 2 3 3.03.02.9
1 1 1 2 4 2.23.22.5
1 1 1 2 5 2.02.81.9
1 1 1 3 1 1.82.61.5
1 1 1 3 2 2.02.81.9
1 1 1 3 3 2.62.62.6
1 1 1 3 4 2.13.22.1
1 1 1 3 5 2.53.02.1

col_types

col_types quy định loại dữ liệu nào sẽ đc parse vào data.frame, nếu để mặc định col_types = NULL, readr sẽ tìm ra loại dữ liệu phù hợp nhất.

Các loại dữ liệu: character, double, integer and logical. _ bỏ qua cột đó.


In [18]:
# Column names
properties <- c("area", "temp", "size", "storage", "method",
                "texture", "flavor", "moistness")

# Import all data, but force all columns to be character: potatoes_char
potatoes_char <- read_tsv("potatoes.txt", col_types = "iiiiiddd", col_names = properties)

In [20]:
str(potatoes_char)


Classes ‘tbl_df’, ‘tbl’ and 'data.frame':	160 obs. of  8 variables:
 $ area     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ temp     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ size     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ storage  : int  1 1 1 1 1 2 2 2 2 2 ...
 $ method   : int  1 2 3 4 5 1 2 3 4 5 ...
 $ texture  : num  2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
 $ flavor   : num  3.2 2.5 2.8 2.9 2.8 3 3.1 3 3.2 2.8 ...
 $ moistness: num  3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
 - attr(*, "spec")=List of 2
  ..$ cols   :List of 8
  .. ..$ area     : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ temp     : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ size     : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ storage  : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ method   : list()
  .. .. ..- attr(*, "class")= chr  "collector_integer" "collector"
  .. ..$ texture  : list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  .. ..$ flavor   : list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  .. ..$ moistness: list()
  .. .. ..- attr(*, "class")= chr  "collector_double" "collector"
  ..$ default: list()
  .. ..- attr(*, "class")= chr  "collector_guess" "collector"
  ..- attr(*, "class")= chr "col_spec"

In [ ]: