The aim here is to find the different factors that have a significant impact on the performances of roaring bitmaps operations. We only focus on one operation, the union between two roaring bitmaps.
We consider the following (boolean) factors:
run_optimize is used before computing the union. This function triggers a check on all the containers to see if a container conversion is required (from bitset/array to run or from run to bitset/array). Note that run containers may still be used even without this function call.-O3 option, false it they are compiled with -O0 option.These other factors have also been considered in a previous experiment. They are set to True in the following.
-O3 option, false it they are compiled with -O0 option.
The reason for that is that we mostly care about large bitmaps (they are obviously the ones that take the longer time), and the code will anyway be compiled with -O3 option when used in production.All results of this section are in one of the files results/broadwell_preliminary_results.csv or results/skylake_preliminary_results.csv, depending on the machine used to get them. They have been generated with the following command:
export LD_LIBRARY_PATH="`pwd`/build/:$LD_LIBRARY_PATH"
./scripts/preliminary_runner.py -n 512 results.csv
It runs 512 experiments. For each experiment, every factor is randomly set to true or false with equal probability.
In [1]:
library(ggplot2)
suppressWarnings(suppressMessages(library(FrF2))) # FrF2 outputs a bunch of useless messages...
In [2]:
all_results_broadwell <- read.csv("results/broadwell_preliminary_results.csv")
all_results_skylake <- read.csv("results/skylake_preliminary_results.csv")
In [3]:
get_aov <- function(results, dense1, dense2) {
subresults <- results[results["dense1"]==dense1 & results["dense2"]==dense2,]
return(aov(time~(copy_on_write+run_containers+amalgamation+avx_enabled)^2, data=subresults))
}
plot_aov <- function(aov_results) {
MEPlot(aov_results, abbrev=4, select=c(1, 2, 3, 4), response="time")
IAPlot(aov_results, abbrev=4, show.alias=FALSE, select=c(1, 2, 3, 4))
}
In [4]:
dense_broadwell_aov <- get_aov(all_results_broadwell, "True", "True")
summary(dense_broadwell_aov)
In [5]:
plot_aov(dense_broadwell_aov)
In [6]:
dense_skylake_aov <- get_aov(all_results_skylake, "True", "True")
summary(dense_skylake_aov)
In [7]:
plot_aov(dense_skylake_aov)
It is clear that the factor run_containers have a very large positive impact when set to True, for both the Broadwell and the Skylake machines. In this case, one should definitely call the function run_optimize before computing the union.
Other factors have a negligible impact.
In [8]:
sparse_broadwell_aov <- get_aov(all_results_broadwell, "False", "False")
summary(sparse_broadwell_aov)
In [9]:
plot_aov(sparse_broadwell_aov)
In [10]:
sparse_skylake_aov <- get_aov(all_results_skylake, "False", "False")
summary(sparse_skylake_aov)
In [11]:
plot_aov(sparse_skylake_aov)
For both machines, the factor AVX_enabled has a very large positive impact. In this case, one should therefore not disable the AVX optimizations of the library.
We also see that amalgamation has a noticeable negative impact. However, this impact is larger when AVX is disabled. So if we chose not to disable it, using amalgamation or not should not matter.
In [12]:
get_onedense_aov <- function(results) {
subresults <- results[(results["dense1"]=="True" | results["dense2"]=="True") & (results["dense1"]=="False" | results["dense2"]=="False"),]
return(aov(time~(copy_on_write+run_containers+amalgamation+avx_enabled)^2, data=subresults))
}
In [13]:
onedense_broadwell_aov <- get_onedense_aov(all_results_broadwell)
summary(onedense_broadwell_aov)
In [14]:
plot_aov(onedense_broadwell_aov)
In [15]:
onedense_skylake_aov <- get_onedense_aov(all_results_skylake)
summary(onedense_skylake_aov)
In [16]:
plot_aov(onedense_skylake_aov)
The factors run_containers and copy_on_write are the most impactfull: they both greatly improve the performances when set to True. Also, these two factors have an high interaction.
Also, amalgamation and avx_enabled both have a significant positive impact with the Skylake machine but not with the Broadwell machine.
We have seen that the factors having the highest impact are not always the same: it depends on the nature of the datasets. This is expected, since the container will change accordingly. For dense datasets, bitset and run containers are more likely to be used, whereas for sparse datasets array containers will be used.
For dense datasets, calling the function run_optimize will get the best improvement in performances.
For sparse datasets, the best improvement can be obtained by compiling the library with AVX instructions enabled (this is the case by default).
When one of the datasets is sparse and the other is dense, calling run_optimize and enabling copy on write will get the highest gain.