This notebook illustrates the level of control and flexibility available in Julia functions. The task is to evaluate the logistic function $(-\infty, \infty)\rightarrow(0,1)$
\begin{equation} x \rightarrow \frac{1}{1 + e^{-x}} \end{equation}and its inverse, the logit or "log-odds" function $(0,1)\rightarrow(-\infty, \infty)$
\begin{equation} p \rightarrow \log\left(\frac{p}{1-p}\right) \end{equation}on an array of numeric values.
The first priority is to evaluate these functions accurately and robustly. This usually means watching for edge cases (e.g. very large positive or negative $x$ or values of $p$ that are close to zero or to one).
The second priority is evaluate them quickly and flexibly. When fitting a logistic regression model to very large data sets these functions may be evaluated hundreds of times on arrays with millions of elements.
In "vectorized" languages, such as R or Matlab/Octave and, to some extent, Python, the obvious choice is to work on vectors. In fact, the language often hides the fact that vectorization is occuring.
The RCall for Julia starts an embedded R process and provides for two-way communication with it. In a Jupyter notebook like this a Julia string prepended with R is evaluated in the R process. String delimiters are " or """. In the second case the string can span multiple lines and can contain " characters.
In [1]:
using RCall
In [2]:
R" logit <- function(p) log(p / (1 - p)) ";
Create a vector of 100,000,000 random values between 0 and 1 on which to evaluate logit. This is done in Julia after setting the random number seed, to allow for reproducibility.
In [2]:
srand(1234321)
pvals = rand(100_000_000);
Copy the vector to the R process under the same name.
In [4]:
@rput pvals;
In [5]:
R""" print(system.time(xvals <- logit(pvals))) """;
The first few values of pvals and xvals are
In [6]:
R" print(str(list(pvals=pvals, xvals=xvals))) ";
Similarly, a vectorized logistic function can be defined as
In [7]:
R"""
logistic <- function(x) 1 / (1 + exp(-x))
print(system.time(pvalsnew <- logistic(xvals)))
""";
In [8]:
R" all(pvals == pvalsnew) "; # check for 'round trip' identity
Out[8]:
The problem with the "round trip" check is that floating point arithmetic is not exact. Numbers are represented in a finite precision. pvalsnew is close to pvals but not exactly equal.
In [9]:
R" print(str(list(pvals=pvals, xvals=xvals, pvn=pvalsnew))) ";
R has an all.equal function that compares floating point values using a tolerance on the differences.
In [10]:
R" print(all.equal(pvals, pvalsnew)) ";
Vectorized languages are wonderful environment when you begin programming because all the messy loop-related baggage is eliminated at the expense of some overhead. The evaluation of logistic(xvals) is done in 5 stages
t1, of 100,000,000 doubles and loop over x writing -x into t1.t2, of 100,000,000 doubles and loop over t1 writing exp(t1) into t2.t3, of 100,000,000 doubles and loop over t2 writing 1 + t2 into t3.result, of 100,000,000 doubles and loop over t3 writing 1 / t3 into result.resultBecause R allows for missing data in any vector the scalar arithmetic is more complicated than just looping over the vector. Every scalar addition in, e.g. 1 + t2 has a check on both addends to see if they are NA. Furthermore, the "recycling rule" that cycles over the 1 operand while looping over the indices of t2 is further logic implemented inside the loop.
Notice that there are 3 temporary vectors allocated and the result must also be allocated. This storage must later be "garbage collected".
The operations could be performed in exactly the same way in Julia. Currently some Julia arithmetic and math functions are vectorized and some aren't. In future releases vectorization will need to be explicitly stated by appending a . to a function name or prepending a . to an operator.
In [ ]:
vlogistic(x::Vector) = 1 ./ (1 .+ exp.(-x));
vlogit(p::Vector) = log.(p ./ (1 .- p))
xvals = vlogit(pvals)
show(xvals[1:5])
Check for approximate equality
In [ ]:
pvals ≈ vlogistic(xvals)
The timings show that the Julia code is faster than the R functions but it still allocates a considerable amount of storage and uses time in garbage collection (gc).
In [ ]:
@time vlogit(pvals);
In [ ]:
@time vlogistic(xvals);
However, there is no need to allocate the intermediate values when operating on only one value.
In [3]:
logit(p) = log(p / (one(p) - p));
logistic(x) = inv(one(x) + exp(-x));
sxvals = logit.(pvals);
show(sxvals[1:5])
In [4]:
@time logit.(pvals);
This type of evaluation is in the "functional programming" style where simple functions are mapped over arrays. Julia allows for results to be pre-allocated as, e.g.
In [6]:
@time map!(logit, sxvals, pvals);
In this case there isn't much of a savings in time but there is a saving in the amount of storage allocated. This becomes important when, e.g. fitting a generalized linear model or a generalized linear mixed model.
In [8]:
pvalsnew = similar(sxvals); @time map!(logistic, pvalsnew, sxvals);
helps with the amount of allocation but doesn't actually run substantially faster.
There is a way to make the evaluation of the log and exp functions slightly faster, which is to use the @fastmath macro in the definitions of the scalar functions.
In [9]:
@fastmath flogit(p) = log(p / (one(p) + p))
function flogistic(x)
expmx = @fastmath(exp(-x))
inv(one(expmx) + expmx)
end
Out[9]:
In [11]:
@time map!(flogit, sxvals, pvals);
In [12]:
@time map!(flogistic, pvalsnew, sxvals);
Some multithreading capability is available in v0.5.0 of Julia. Later versions with enhance these capabilities. Before starting this notebook I set the environment variable JULIA_NUM_THREADS=4 as this is running on a 4-core processor.
It is easiest to use multiple threads on a simple loop. Define a function that overwrites the values in one array with the logit or logistic of the values in another array. By convention, a ! is appended to the name of such a mutating function, which modifies one or more of its arguments.
In [13]:
function logit!(dest, src)
length(dest) == length(src) || throw(DimensionMismatch())
Threads.@threads for i in eachindex(dest, src)
@inbounds dest[i] = flogit(src[i])
end
dest
end
Out[13]:
In [14]:
@time logit!(sxvals, pvals);
The scaling is very good with 4 threads being
In [ ]:
1.66/0.571
times as fast as a single thread.