In [1]:
using Dates
include("printmat.jl")
if !isdir("Results")
error("create the subfolder Results before running this program")
end
The csv ("comma-separated values") format provides a simple and robust method for moving data, and it can be read by most software.
For instance, for reading a data file delimited by comma (,) and where the first line of the file contains variable names, then use the following
(x,header) = readdlm(FileName,',',header=true)
Alternatively, use
x = readdlm(FileName,',',skipstart=1)
to disregard the first line.
Extra arguments control the type of data (Float64, Int, etc), suppression of comment lines and more.
If you need more powerful write/read routines, try the CSV package.
In [2]:
using DelimitedFiles
(x,header) = readdlm("Data/CsvFile.csv",',',header=true) #read csv file
println("header of csv file:")
printmat(header)
println("x:")
printmat(x)
To write csv data, the simplest approach is to create the matrix you want to save and then run
writedlm(FileName,matrix)
Alternatively, to write several matrices to the file (without having to first combine them), use
fh = open(Filename, "w")
writedlm(fh,matrix1,',')
writedlm(fh,matrix2,',')
close(fh)
In [3]:
x = [1.1 1.2 1.3;
2.1 2.2 2.3]
header = ["X" "Y" "Z"]
xx = Any[header; x] #to save
writedlm("Results/NewCsvFile.csv",xx,',') #write csv file
printblue("NewCsvFile.csv has been created in the subfolder Results. Check it out.")
The next cells show how to load a csv files with dates (for instance, 15/01/1979) and some missing values.
The code
x2[:,1] to Date,x = x2[:,2:end] that are not numbers and converts them to NaN (use with Float64) or missing (use when data is not Float64).
In [4]:
"""
Change elements with missing data (' ') to either NaN or missing.
`x` is the input matrix, `Typ` is the type of the output (Float64, Int, etc) and
`missval` is your choice of either `NaN` or `missing`
"""
function readdlmFix(x,Typ=Float64,missVal=NaN)
y = replace(z->!isa(z,Number) ? missVal : z,x)
ismissing(missVal) && (Typ = Union{Missing,Typ}) #allow missing
y = convert.(Typ,y)
return y
end
Out[4]:
In [5]:
x2 = readdlm("Data/CsvFileWithDates.csv",',',skipstart=1)
dN = Date.(x2[:,1],"d/m/y") #to Date, "d/m/y" is the date format in the file
x = x2[:,2:end] #the data, but Any[] since missing data
println("dates and data (first 4 obs):")
printmat(Any[dN[1:4] x[1:4,:]])
x = readdlmFix(x)
println("after fix of missing data (first 4 obs):")
printmat(Any[dN[1:4] x[1:4,:]])
hdf5 files are used in many computer languages. They can store different types of data: integers, floats, strings (but not Julia Dates).
The basic syntax of the HDF5 package is
fh = h5open(FileName,"r") #open for reading
(x,y) = read(fh,"x","y")
close(fh)
fh = h5open(FileName,"w") #open for writing
write(fh,"x",x)
write(fh,"y",y)
close(fh)
To save dates, save either a matrix [y m d] (see eg. Dates.month(date)) or a date value (see eg. Dates.value(date)).
The HDFVIEW program allows you to look at the contents of a hdf5 file. (It is not needed here.)
In [6]:
using HDF5
fh = h5open("Data/H5File.h5","r") #open for reading
println("\nVariables in h5 file: ",names(fh))
(x,B,ymd) = read(fh,"x","B","ymd") #load some of the data
close(fh)
dN = Date.(ymd[:,1],ymd[:,2],ymd[:,3]) #reconstructing dates
println("\ndates and x from h5 file is")
printmat(Any[dN x])
In [7]:
x = [1.1 1.2 1.3;
2.1 2.2 2.3]
ymd = [2019 5 14;
2019 5 15]
B = 1
C = "Nice cat"
fh = h5open("Results/NewH5File.h5","w") #open file for writing
write(fh,"x",x)
write(fh,"ymd",ymd)
write(fh,"B",B)
write(fh,"C",C)
close(fh) #close file
println("NewH5File.h5 has been created in the subfolder Results")
jld files can store very different types of data: integers, floats, strings, dictionaries, etc. It is a dialect of hdf5, designed to save different Julia objects (including Dates).
The basic syntax of the JLD package is
(A,B) = load(FileName,"A","B") #load some data
xx = load("Data/NewJldFile.jld") #load all data into a Dict()
save(FileName,"A",A,"B",B) #save data
(It also possible to use the same syntax as for HDF5, except that we use jldopen instead of h5open.)
The JLD2 package is an an interesting alternative. It uses the same syntax and is often faster. However, it is (as of June 2020) not actively maintained, so its future is unclear.
In [8]:
using JLD #for JLD2: using FileIO, JLD2
xx = load("Data/JldFile.jld") #load entire file
xx_keys = filter(i->!startswith(i,"_creator"),keys(xx)) #filter out extra key
println("The variables are: ",xx_keys) #list contents of the file
(x,d) = load("Data/JldFile.jld","x","d") #read some of the data
println("\ndates and x from jld file is")
printmat(Any[dN x])
In [9]:
x = [1.1 1.2 1.3;
2.1 2.2 2.3]
d = [Date(2019,5,14); #Julia dates
Date(2019,5,15)]
B = 1
C = "Nice cat"
save("Results/NewJldFile.jld","x",x,"d",d,"B",B,"C",C) #write jld file
println("NewJldFile.jld has been created in the subfolder Results")
The MAT package allows you to load/save (Matlab) mat files (which is a dialect of HDF5).
In [10]:
using MAT
function DateMLtoDate(dNum) #Matlab datenum to Julia date
dNum = round.(Int,dNum) .- 366
dNTime = Dates.rata2datetime.(dNum)
dN = Date.(dNTime)
return dN
end
fh = matopen("Data/MatFile.mat")
println("\nVariables in mat file: ",names(fh))
(x,dM) = read(fh,"x","dM")
close(fh)
d = DateMLtoDate(dM) #Matlab datenum to Julia date
println("\ndates and x from mat file is")
printmat(Any[d x])
In [11]:
x = [1.1 1.2 1.3;
2.1 2.2 2.3]
d = [Date(2019,5,14); #Julia dates
Date(2019,5,15)]
B = 1
C = "Nice cat"
dM = Dates.value.(d) .+ 366.0 #Julia Date to Matlab's datenum(), Float64
fh = matopen("Results/NewMatFile.mat","w")
write(fh,"x",x) #write one variable at a time
write(fh,"B",B)
write(fh,"dM",dM)
write(fh,"C",C)
close(fh)
println("\nNewMatFile.mat has been created in the subfolder Results")
The XLSX package allows you to read and write xls (and xlsx) files.
As an alternative, you can use ExcelReaders, which requires python and python's xlrd libarary. For instance, this would work
using ExcelReaders
data1 = readxl("Data/XlsFile.xlsx","Data!B2:C11")
x1 = convert.(Float64,data1)
printmat(x1)
In [12]:
using XLSX
data1 = XLSX.readxlsx("Data/XlsFile.xlsx") #reading the entire file
x1 = data1["Data!B2:C11"] #extracting a part of the sheet "Data"
x1 = convert.(Float64,x1) #converting from Any to Float64
println("part of the xlsx file:")
printmat(x1)
Suppose you have
x with datax, for instance, from header in a CSV file or saved as a vector of strings in a hdf5/mat/jld file.If there are few variables, then you can manually create each of them from the loaded matrix. This becomes tedious when there are many variables.
However, it is easy to create a Dict() which can be used to easily refer to the variable names, for instance, D[:X] to get variable X.
In [13]:
(x,header) = readdlm("Data/CsvFile.csv",',',header=true)
n = size(x,2)
(X,Y,Z) = [x[:,i] for i=1:n] #manually creating X,Y,Z
D = Dict([(Symbol(header[i]),x[:,i]) for i=1:n]) #Creating D with :X,:Y,:Z
println("x[:,1], X and D[:X]")
printmat([x[:,1] X D[:X]])
In [ ]: