Preliminary analysis

The aim here is to find the different factors that have a significant impact on the performances of roaring bitmaps operations. We only focus on one operation, the union between two roaring bitmaps.

We consider the following (boolean) factors:

dense1 (resp. dense2): true if the elements of the first (resp. second) roaring bitmap are sampled in an interval of size 3N/2, false if they are sampled in an interval of size 2^6N (N being the number of elements).
copy_on_write: true if the copy on write optimization is used for the two roaring bitmaps (if the optimization is used and the result has a container with the same content than one of the operands' containers, then it uses a reference to this container and no copy is done).
run_containers: true if the function run_optimize is used before computing the union. This function triggers a check on all the containers to see if a container conversion is required (from bitset/array to run or from run to bitset/array). Note that run containers may still be used even without this function call.
amalgamation: true if the library is amalgamated before the compilation. This means that all the .c (resp. .h) files are concatenated into a single roaring.c file (resp. roaring.h). This may help the compiler to perform better optimizations.
gcc_optimization: true if the library as well as the test code is compiled with -O3 option, false it they are compiled with -O0 option.
avx_enabled: true if AVX instructions are used (vector instructions, e.g. to perform in parallel an addition on eight 32 bits integers).

These other factors have also been considered in a previous experiment. They are set to True in the following.

large1 (resp. large2): true if the first (resp. second) roaring bitmap has (approximately) 2^20 elements, false if it has (approximately) 2^4 elements.
gcc_optimization: true if the library as well as the test code is compiled with -O3 option, false it they are compiled with -O0 option. The reason for that is that we mostly care about large bitmaps (they are obviously the ones that take the longer time), and the code will anyway be compiled with -O3 option when used in production.

All results of this section are in one of the files results/broadwell_preliminary_results.csv or results/skylake_preliminary_results.csv, depending on the machine used to get them. They have been generated with the following command:

export LD_LIBRARY_PATH="`pwd`/build/:$LD_LIBRARY_PATH"
./scripts/preliminary_runner.py -n 512 results.csv

It runs 512 experiments. For each experiment, every factor is randomly set to true or false with equal probability.



In [1]:

    
library(ggplot2)
suppressWarnings(suppressMessages(library(FrF2))) # FrF2 outputs a bunch of useless messages...



In [2]:

    
all_results_broadwell <- read.csv("results/broadwell_preliminary_results.csv")
all_results_skylake <- read.csv("results/skylake_preliminary_results.csv")



In [3]:

    
get_aov <- function(results, dense1, dense2) {
    subresults <- results[results["dense1"]==dense1 & results["dense2"]==dense2,]
    return(aov(time~(copy_on_write+run_containers+amalgamation+avx_enabled)^2, data=subresults))
}

plot_aov <- function(aov_results) {
    MEPlot(aov_results, abbrev=4, select=c(1, 2, 3, 4), response="time")
    IAPlot(aov_results, abbrev=4, show.alias=FALSE, select=c(1, 2, 3, 4))
}

Dense datasets

Broadwell



In [4]:

    
dense_broadwell_aov <- get_aov(all_results_broadwell, "True", "True")
summary(dense_broadwell_aov)









    





                              Df    Sum Sq   Mean Sq   F value  Pr(>F)    
copy_on_write                  1 0.0000000 0.0000000     1.098 0.29677    
run_containers                 1 0.0007391 0.0007391 20163.484 < 2e-16 ***
amalgamation                   1 0.0000000 0.0000000     0.488 0.48641    
avx_enabled                    1 0.0000001 0.0000001     2.881 0.09227 .  
copy_on_write:run_containers   1 0.0000000 0.0000000     1.100 0.29647    
copy_on_write:amalgamation     1 0.0000000 0.0000000     1.325 0.25204    
copy_on_write:avx_enabled      1 0.0000000 0.0000000     0.593 0.44282    
run_containers:amalgamation    1 0.0000000 0.0000000     0.524 0.47050    
run_containers:avx_enabled     1 0.0000001 0.0000001     2.737 0.10071    
amalgamation:avx_enabled       1 0.0000004 0.0000004    10.541 0.00152 ** 
Residuals                    117 0.0000043 0.0000000                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [5]:

    
plot_aov(dense_broadwell_aov)

Skylake



In [6]:

    
dense_skylake_aov <- get_aov(all_results_skylake, "True", "True")
summary(dense_skylake_aov)









    





                              Df    Sum Sq   Mean Sq   F value   Pr(>F)    
copy_on_write                  1 0.0000000 0.0000000 2.760e-01 0.600489    
run_containers                 1 0.0006675 0.0006675 9.221e+06  < 2e-16 ***
amalgamation                   1 0.0000000 0.0000000 9.914e+01  < 2e-16 ***
avx_enabled                    1 0.0000000 0.0000000 1.200e-01 0.729784    
copy_on_write:run_containers   1 0.0000000 0.0000000 5.670e-01 0.452936    
copy_on_write:amalgamation     1 0.0000000 0.0000000 1.810e-01 0.671479    
copy_on_write:avx_enabled      1 0.0000000 0.0000000 4.280e-01 0.514300    
run_containers:amalgamation    1 0.0000000 0.0000000 1.013e+02  < 2e-16 ***
run_containers:avx_enabled     1 0.0000000 0.0000000 1.286e+01 0.000491 ***
amalgamation:avx_enabled       1 0.0000000 0.0000000 1.660e+00 0.200130    
Residuals                    117 0.0000000 0.0000000                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [7]:

    
plot_aov(dense_skylake_aov)

It is clear that the factor run_containers have a very large positive impact when set to True, for both the Broadwell and the Skylake machines. In this case, one should definitely call the function run_optimize before computing the union.

Other factors have a negligible impact.

Sparse datasets

Broadwell



In [8]:

    
sparse_broadwell_aov <- get_aov(all_results_broadwell, "False", "False")
summary(sparse_broadwell_aov)









    





                              Df    Sum Sq   Mean Sq   F value   Pr(>F)    
copy_on_write                  1 0.0000001 0.0000001     4.318   0.0399 *  
run_containers                 1 0.0000000 0.0000000     0.110   0.7402    
amalgamation                   1 0.0000005 0.0000005    22.224 6.75e-06 ***
avx_enabled                    1 0.0012555 0.0012555 51614.313  < 2e-16 ***
copy_on_write:run_containers   1 0.0000000 0.0000000     1.370   0.2441    
copy_on_write:amalgamation     1 0.0000001 0.0000001     2.588   0.1104    
copy_on_write:avx_enabled      1 0.0000000 0.0000000     0.607   0.4375    
run_containers:amalgamation    1 0.0000001 0.0000001     3.929   0.0498 *  
run_containers:avx_enabled     1 0.0000001 0.0000001     2.102   0.1498    
amalgamation:avx_enabled       1 0.0000011 0.0000011    46.147 4.85e-10 ***
Residuals                    117 0.0000028 0.0000000                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [9]:

    
plot_aov(sparse_broadwell_aov)

Skylake



In [10]:

    
sparse_skylake_aov <- get_aov(all_results_skylake, "False", "False")
summary(sparse_skylake_aov)









    





                              Df    Sum Sq   Mean Sq   F value Pr(>F)    
copy_on_write                  1 0.0000000 0.0000000 1.100e-02  0.916    
run_containers                 1 0.0000000 0.0000000 5.100e-02  0.821    
amalgamation                   1 0.0000002 0.0000002 3.786e+02 <2e-16 ***
avx_enabled                    1 0.0013742 0.0013742 2.748e+06 <2e-16 ***
copy_on_write:run_containers   1 0.0000000 0.0000000 1.633e+00  0.204    
copy_on_write:amalgamation     1 0.0000000 0.0000000 4.630e-01  0.498    
copy_on_write:avx_enabled      1 0.0000000 0.0000000 1.050e-01  0.746    
run_containers:amalgamation    1 0.0000000 0.0000000 7.620e-01  0.385    
run_containers:avx_enabled     1 0.0000000 0.0000000 6.880e-01  0.409    
amalgamation:avx_enabled       1 0.0000001 0.0000001 2.526e+02 <2e-16 ***
Residuals                    117 0.0000001 0.0000000                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [11]:

    
plot_aov(sparse_skylake_aov)

For both machines, the factor AVX_enabled has a very large positive impact. In this case, one should therefore not disable the AVX optimizations of the library.

We also see that amalgamation has a noticeable negative impact. However, this impact is larger when AVX is disabled. So if we chose not to disable it, using amalgamation or not should not matter.

Large and one-dense datasets

Here, one and only one of the datasets is dense.



In [12]:

    
get_onedense_aov <- function(results) {
    subresults <- results[(results["dense1"]=="True" | results["dense2"]=="True") & (results["dense1"]=="False" | results["dense2"]=="False"),]
    return(aov(time~(copy_on_write+run_containers+amalgamation+avx_enabled)^2, data=subresults))
}



In [13]:

    
onedense_broadwell_aov <- get_onedense_aov(all_results_broadwell)
summary(onedense_broadwell_aov)









    





                              Df    Sum Sq   Mean Sq   F value Pr(>F)    
copy_on_write                  1 0.0000315 0.0000315  1128.132 <2e-16 ***
run_containers                 1 0.0003498 0.0003498 12542.562 <2e-16 ***
amalgamation                   1 0.0000000 0.0000000     0.162 0.6873    
avx_enabled                    1 0.0000000 0.0000000     0.005 0.9430    
copy_on_write:run_containers   1 0.0000002 0.0000002     6.060 0.0145 *  
copy_on_write:amalgamation     1 0.0000000 0.0000000     0.569 0.4515    
copy_on_write:avx_enabled      1 0.0000000 0.0000000     0.421 0.5169    
run_containers:amalgamation    1 0.0000000 0.0000000     0.177 0.6747    
run_containers:avx_enabled     1 0.0000000 0.0000000     0.167 0.6834    
amalgamation:avx_enabled       1 0.0000023 0.0000023    84.025 <2e-16 ***
Residuals                    245 0.0000068 0.0000000                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [14]:

    
plot_aov(onedense_broadwell_aov)



In [15]:

    
onedense_skylake_aov <- get_onedense_aov(all_results_skylake)
summary(onedense_skylake_aov)









    





                              Df    Sum Sq   Mean Sq   F value   Pr(>F)    
copy_on_write                  1 1.306e-05 1.306e-05  2987.590  < 2e-16 ***
run_containers                 1 2.443e-04 2.443e-04 55908.618  < 2e-16 ***
amalgamation                   1 3.000e-08 3.000e-08     7.541  0.00648 ** 
avx_enabled                    1 8.000e-08 8.000e-08    18.650 2.28e-05 ***
copy_on_write:run_containers   1 1.200e-07 1.200e-07    27.052 4.18e-07 ***
copy_on_write:amalgamation     1 0.000e+00 0.000e+00     0.004  0.95024    
copy_on_write:avx_enabled      1 0.000e+00 0.000e+00     0.016  0.89837    
run_containers:amalgamation    1 4.000e-08 4.000e-08     8.011  0.00504 ** 
run_containers:avx_enabled     1 4.000e-08 4.000e-08     9.316  0.00252 ** 
amalgamation:avx_enabled       1 3.000e-08 3.000e-08     7.945  0.00522 ** 
Residuals                    245 1.070e-06 0.000e+00                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



In [16]:

    
plot_aov(onedense_skylake_aov)

The factors run_containers and copy_on_write are the most impactfull: they both greatly improve the performances when set to True. Also, these two factors have an high interaction.

Also, amalgamation and avx_enabled both have a significant positive impact with the Skylake machine but not with the Broadwell machine.

Conclusion

We have seen that the factors having the highest impact are not always the same: it depends on the nature of the datasets. This is expected, since the container will change accordingly. For dense datasets, bitset and run containers are more likely to be used, whereas for sparse datasets array containers will be used.

For dense datasets, calling the function run_optimize will get the best improvement in performances.

For sparse datasets, the best improvement can be obtained by compiling the library with AVX instructions enabled (this is the case by default).

When one of the datasets is sparse and the other is dense, calling run_optimize and enabling copy on write will get the highest gain.