Loading and Saving Data

to/from csv, hdf5, jld, mat and xlsx files.

Load Packages and Extra Functions

The packages are loaded in the respective sections below. This allows you to run parts of this notebook without having to install all packages.

The data files created by this notebook are written to and loaded from the subfolder "Results".


In [1]:
using Dates
include("printmat.jl")

if !isdir("Results")
    error("create the subfolder Results before running this program")
end

Loading a csv File

The csv ("comma-separated values") format provides a simple and robust method for moving data, and it can be read by most software.

For instance, for reading a data file delimited by comma (,) and where the first line of the file contains variable names, then use the following

(x,header) = readdlm(FileName,',',header=true)

Alternatively, use

x = readdlm(FileName,',',skipstart=1)

to disregard the first line.

Extra arguments control the type of data (Float64, Int, etc), suppression of comment lines and more.

If you need more powerful write/read routines, try the CSV package.


In [2]:
using DelimitedFiles

(x,header) = readdlm("Data/CsvFile.csv",',',header=true)  #read csv file

println("header of csv file:")
printmat(header)
println("x:")
printmat(x)


header of csv file:
         X         Y         Z

x:
     1.100     1.200     1.300
     2.100     2.200     2.300

Saving a csv File

To write csv data, the simplest approach is to create the matrix you want to save and then run

writedlm(FileName,matrix)

Alternatively, to write several matrices to the file (without having to first combine them), use

fh = open(Filename, "w")
    writedlm(fh,matrix1,',')
    writedlm(fh,matrix2,',')
close(fh)

In [3]:
x   = [1.1 1.2 1.3;
       2.1 2.2 2.3]
header = ["X" "Y" "Z"]

xx = Any[header; x]                         #to save 
writedlm("Results/NewCsvFile.csv",xx,',')  #write csv file

printblue("NewCsvFile.csv has been created in the subfolder Results. Check it out.")


NewCsvFile.csv has been created in the subfolder Results. Check it out.

Loading csv with Dates and Missing Values (extra)

The next cells show how to load a csv files with dates (for instance, 15/01/1979) and some missing values.

The code

  1. reads the csv file
  2. converts x2[:,1] to Date,
  3. finds all elements in x = x2[:,2:end] that are not numbers and converts them to NaN (use with Float64) or missing (use when data is not Float64).

In [4]:
"""
Change elements with missing data (' ') to either NaN or missing. 
`x` is the input matrix, `Typ` is the type of the output (Float64, Int, etc) and 
`missval` is your choice of either `NaN` or `missing`
"""
function readdlmFix(x,Typ=Float64,missVal=NaN)
    y = replace(z->!isa(z,Number) ? missVal : z,x)
    ismissing(missVal) && (Typ = Union{Missing,Typ})   #allow missing
    y = convert.(Typ,y)
  return y
end


Out[4]:
readdlmFix

In [5]:
x2 = readdlm("Data/CsvFileWithDates.csv",',',skipstart=1)
dN = Date.(x2[:,1],"d/m/y")     #to Date, "d/m/y" is the date format in the file
x  = x2[:,2:end]                #the data, but Any[] since missing data

println("dates and data (first 4 obs):")
printmat(Any[dN[1:4] x[1:4,:]])

x = readdlmFix(x)

println("after fix of missing data (first 4 obs):")
printmat(Any[dN[1:4] x[1:4,:]])


dates and data (first 4 obs):
1979-01-02    96.730          
1979-01-03               9.310
1979-01-04    98.580     9.310
1979-01-05    99.130     9.340

after fix of missing data (first 4 obs):
1979-01-02    96.730       NaN
1979-01-03       NaN     9.310
1979-01-04    98.580     9.310
1979-01-05    99.130     9.340

Loading and Saving hdf5

hdf5 files are used in many computer languages. They can store different types of data: integers, floats, strings (but not Julia Dates).

The basic syntax of the HDF5 package is

fh = h5open(FileName,"r")   #open for reading
    (x,y) = read(fh,"x","y")
close(fh)

fh = h5open(FileName,"w")   #open for writing
    write(fh,"x",x)
    write(fh,"y",y)
close(fh)

To save dates, save either a matrix [y m d] (see eg. Dates.month(date)) or a date value (see eg. Dates.value(date)).

The HDFVIEW program allows you to look at the contents of a hdf5 file. (It is not needed here.)


In [6]:
using HDF5

fh = h5open("Data/H5File.h5","r")                     #open for reading
     println("\nVariables in h5 file: ",names(fh))
    (x,B,ymd) = read(fh,"x","B","ymd")                #load some of the data
close(fh)

dN = Date.(ymd[:,1],ymd[:,2],ymd[:,3])                #reconstructing dates

println("\ndates and x from h5 file is")
printmat(Any[dN x])


Variables in h5 file: ["B", "C", "x", "ymd"]

dates and x from h5 file is
2019-05-14     1.100     1.200     1.300
2019-05-15     2.100     2.200     2.300


In [7]:
x   = [1.1 1.2 1.3;
       2.1 2.2 2.3]
ymd = [2019 5 14;
       2019 5 15]
B   = 1
C   = "Nice cat"

fh = h5open("Results/NewH5File.h5","w")    #open file for writing   
    write(fh,"x",x)
    write(fh,"ymd",ymd)
    write(fh,"B",B)
    write(fh,"C",C)
close(fh)                                  #close file

println("NewH5File.h5 has been created in the subfolder Results")


NewH5File.h5 has been created in the subfolder Results

Loading and Saving jld

jld files can store very different types of data: integers, floats, strings, dictionaries, etc. It is a dialect of hdf5, designed to save different Julia objects (including Dates).

The basic syntax of the JLD package is

(A,B) = load(FileName,"A","B")        #load some data 
xx = load("Data/NewJldFile.jld")      #load all data into a Dict()
save(FileName,"A",A,"B",B)            #save data

(It also possible to use the same syntax as for HDF5, except that we use jldopen instead of h5open.)

The JLD2 package is an an interesting alternative. It uses the same syntax and is often faster. However, it is (as of June 2020) not actively maintained, so its future is unclear.


In [8]:
using JLD                                        #for JLD2: using FileIO, JLD2

xx = load("Data/JldFile.jld")                    #load entire file
xx_keys =  filter(i->!startswith(i,"_creator"),keys(xx)) #filter out extra key
println("The variables are: ",xx_keys)           #list contents of the file 
 
(x,d) = load("Data/JldFile.jld","x","d")         #read some of the data

println("\ndates and x from jld file is")
printmat(Any[dN x])


The variables are: Set(["B", "C", "x", "d"])

dates and x from jld file is
2019-05-14     1.100     1.200     1.300
2019-05-15     2.100     2.200     2.300


In [9]:
x   = [1.1 1.2 1.3;
       2.1 2.2 2.3]
d   = [Date(2019,5,14);                                #Julia dates
       Date(2019,5,15)]
B   = 1
C   = "Nice cat"

save("Results/NewJldFile.jld","x",x,"d",d,"B",B,"C",C)       #write jld file

println("NewJldFile.jld has been created in the subfolder Results")


NewJldFile.jld has been created in the subfolder Results

Saving and Loading Matlab mat files (extra)

The MAT package allows you to load/save (Matlab) mat files (which is a dialect of HDF5).


In [10]:
using MAT

function DateMLtoDate(dNum)         #Matlab datenum to Julia date
  dNum   = round.(Int,dNum) .- 366
  dNTime = Dates.rata2datetime.(dNum)
  dN     = Date.(dNTime)
  return dN
end


fh = matopen("Data/MatFile.mat")
    println("\nVariables in mat file: ",names(fh))
    (x,dM) = read(fh,"x","dM")
close(fh) 

d = DateMLtoDate(dM)                #Matlab datenum to Julia date

println("\ndates and x from mat file is")
printmat(Any[d x])


Variables in mat file: ["B", "C", "dM", "x"]

dates and x from mat file is
2019-05-14     1.100     1.200     1.300
2019-05-15     2.100     2.200     2.300


In [11]:
x   = [1.1 1.2 1.3;
       2.1 2.2 2.3]
d   = [Date(2019,5,14);                         #Julia dates
       Date(2019,5,15)]
B   = 1
C   = "Nice cat"

dM  = Dates.value.(d) .+ 366.0  #Julia Date to Matlab's datenum(), Float64

fh = matopen("Results/NewMatFile.mat","w")
    write(fh,"x",x)             #write one variable at a time
    write(fh,"B",B)
    write(fh,"dM",dM)
    write(fh,"C",C)
close(fh)

println("\nNewMatFile.mat has been created in the subfolder Results")


NewMatFile.mat has been created in the subfolder Results

Loading an xls File

The XLSX package allows you to read and write xls (and xlsx) files.

As an alternative, you can use ExcelReaders, which requires python and python's xlrd libarary. For instance, this would work

using ExcelReaders
data1 = readxl("Data/XlsFile.xlsx","Data!B2:C11")
x1    = convert.(Float64,data1)            
printmat(x1)

In [12]:
using XLSX

data1 = XLSX.readxlsx("Data/XlsFile.xlsx")   #reading the entire file
x1    = data1["Data!B2:C11"]                 #extracting a part of the sheet "Data"
x1    = convert.(Float64,x1)                 #converting from Any to Float64

println("part of the xlsx file:")
printmat(x1)


part of the xlsx file:
    16.660  -999.990
    16.850  -999.990
    16.930  -999.990
    16.980  -999.990
    17.080  -999.990
    17.030     7.000
    17.090     8.000
    16.760  -999.990
    16.670  -999.990
    16.720  -999.990

Creating Variables from Variable Names (extra)

Suppose you have

  1. a matrix x with data
  2. a list of the variable names for each column of x, for instance, from header in a CSV file or saved as a vector of strings in a hdf5/mat/jld file.

If there are few variables, then you can manually create each of them from the loaded matrix. This becomes tedious when there are many variables.

However, it is easy to create a Dict() which can be used to easily refer to the variable names, for instance, D[:X] to get variable X.


In [13]:
(x,header) = readdlm("Data/CsvFile.csv",',',header=true)
n = size(x,2)

(X,Y,Z) = [x[:,i] for i=1:n]                         #manually creating X,Y,Z

D = Dict([(Symbol(header[i]),x[:,i]) for i=1:n])     #Creating D with :X,:Y,:Z

println("x[:,1], X and D[:X]")
printmat([x[:,1] X D[:X]])


x[:,1], X and D[:X]
     1.100     1.100     1.100
     2.100     2.100     2.100


In [ ]: