In [1]:
using Dates
include("printmat.jl")
Out[1]:
The NaN
(Not-a-Number) can be used to indicate that a float "number" is missing or otherwise strange. For other types of data that floats, you may want to use a missing
(see below) instead.
Most computations involving NaNs give NaN as the result.
NaNs are often used to represent missing data, but works only with floating point numbers like (e.g. 2.0), and not with integers (e.g. 2).
In [2]:
println(2.0 + NaN)
In [3]:
data = [1.0 -999.99;
2.0 12.0;
3.0 13.0]
z = replace(data,-999.99=>NaN) #replace -999.99 by NaN
println("z: ")
printmat(z)
In [4]:
if any(isnan.(z)) #check if any NaNs
println("z has some NaNs")
end
println("\nThe sum of each column: ")
printmat(sum(z,dims=1))
It is a common procedure in statistics to throw out all cases with NaNs. For instance, if z[t,:]
is the data for period $t$ and it contains one or more NaN
values, then it is common to throw out that entire row.
This is a reasonable approach if it can be argued that the fact that the data is missing is random - and not related to the subject of the investigation. It is much less reasonable if, for instance, the returns for all poorly performing mutual funds are listed as "missing" - and you want to study what fund characteristics that drive performance.
The code below shows a simple way of how.
In [5]:
vb = any(isnan.(z),dims=2) #indicates rows with NaNs
z2 = z[.!vec(vb),:] #keep only rows without NaNs
println("z2: a new matrix where all rows with any NaNs have been pruned:")
printmat(z2)
can be used to indicate missing values for most types (not just floats).
Similarly to NaNs, computations involving missing
(for instance, 1+missing
) result in missing
.
In contrast to NaNs, working with missing
sometimes involves converting a traditional array to an array that can include missing
(or the the other way). The Missings package has help routines.
In [6]:
using Missings
In [7]:
data = [1 -999;
2 12;
3 13]
z = allowmissing(data) #convert to an array that can include missing
z = replace(data,-999=>missing) #replace -999 by missing
println("z: ")
printmat(z)
In [8]:
if any(ismissing.(z)) #check if any NaNs
println("z has some missings")
end
In [9]:
vc = any(ismissing.(z),dims=2)
z2 = z[.!vec(vc),:] #keep only rows without NaNs
println("z2: a new matrix where all rows with any missings have been pruned:")
printmat(z2)
Once z2
does not have any missing
(although it still allows you to) you can typically use it as any other array. However, if you (for some reason) need to work with a traditional array, then convert z2
(see below).
In [10]:
println("The type of z2 is ", typeof(z2))
z3 = disallowmissing(z2) #convert to traditional array,
#same as same as convert.(Int,z2)
println("\nThe type of z3 is ", typeof(z3))
In [ ]: