Preliminary analysis

The aim here is to find the different factors that have a significant impact on the performances of roaring bitmaps operations. We only focus on one operation, the union between two roaring bitmaps.

We consider the following (boolean) factors:

  • large1 (resp. large2): true if the first (resp. second) roaring bitmap has (approximately) 2^20 elements, false if it has (approximately) 2^4 elements.
  • dense1 (resp. dense2): true if the elements of the first (resp. second) roaring bitmap are sampled in an interval of size 3N/2, false if they are sampled in an interval of size 2^6N (N being the number of elements).
  • copy_on_write: true if the copy on write optimization is used for the two roaring bitmaps (if the optimization is used and the result has a container with the same content than one of the operands' containers, then it uses a reference to this container and no copy is done).
  • run_containers: true if the function run_optimize is used before computing the union. This function triggers a check on all the containers to see if a container conversion is required (from bitset/array to run or from run to bitset/array). Note that run containers may still be used even without this function call.
  • amalgamation: true if the library is amalgamated before the compilation. This means that all the .c (resp. .h) files are concatenated into a single roaring.c file (resp. roaring.h). This may help the compiler to perform better optimizations.
  • gcc_optimization: true if the library as well as the test code is compiled with -O3 option, false it they are compiled with -O0 option.
  • avx_enabled: true if AVX instructions are used (vector instructions, e.g. to perform in parallel an addition on eight 32 bits integers).

All results of this section are in the file results/preliminary_results.csv. They have been generated with the following command:

export LD_LIBRARY_PATH="`pwd`/build/:$LD_LIBRARY_PATH"
./scripts/preliminary_runner.py -n 1024 results/preliminary_results.csv

It runs 1024 experiments. For each experiment, every factor is randomly set to true or false with equal probability.


In [1]:
library(ggplot2)
suppressWarnings(suppressMessages(library(FrF2))) # FrF2 outputs a bunch of useless messages...

First try: considering all variables at once


In [2]:
all_results <- read.csv("results/old_preliminary_results.csv")

In [3]:
all_results_aov <- aov(time~(large1+large2+dense1+dense2+copy_on_write+run_containers+amalgamation+gcc_optimization+avx_enabled)^2, data=all_results)
summary(all_results_aov)


                                 Df   Sum Sq   Mean Sq F value   Pr(>F)    
large1                            1 0.000737 0.0007369 137.730  < 2e-16 ***
large2                            1 0.000738 0.0007382 137.965  < 2e-16 ***
dense1                            1 0.000672 0.0006724 125.677  < 2e-16 ***
dense2                            1 0.000668 0.0006683 124.896  < 2e-16 ***
copy_on_write                     1 0.000020 0.0000204   3.813   0.0511 .  
run_containers                    1 0.000000 0.0000001   0.011   0.9148    
amalgamation                      1 0.000002 0.0000016   0.296   0.5864    
gcc_optimization                  1 0.000224 0.0002244  41.948 1.48e-10 ***
avx_enabled                       1 0.000002 0.0000023   0.423   0.5154    
large1:large2                     1 0.000561 0.0005611 104.872  < 2e-16 ***
large1:dense1                     1 0.000674 0.0006736 125.887  < 2e-16 ***
large1:dense2                     1 0.000524 0.0005243  97.985  < 2e-16 ***
large1:copy_on_write              1 0.000002 0.0000019   0.347   0.5559    
large1:run_containers             1 0.000000 0.0000000   0.005   0.9428    
large1:amalgamation               1 0.000002 0.0000017   0.312   0.5768    
large1:gcc_optimization           1 0.000222 0.0002218  41.450 1.89e-10 ***
large1:avx_enabled                1 0.000002 0.0000023   0.434   0.5101    
large2:dense1                     1 0.000528 0.0005276  98.599  < 2e-16 ***
large2:dense2                     1 0.000670 0.0006702 125.269  < 2e-16 ***
large2:copy_on_write              1 0.000002 0.0000019   0.354   0.5520    
large2:run_containers             1 0.000000 0.0000000   0.007   0.9322    
large2:amalgamation               1 0.000002 0.0000017   0.312   0.5769    
large2:gcc_optimization           1 0.000221 0.0002214  41.370 1.97e-10 ***
large2:avx_enabled                1 0.000002 0.0000023   0.427   0.5134    
dense1:dense2                     1 0.000523 0.0005226  97.673  < 2e-16 ***
dense1:copy_on_write              1 0.000001 0.0000015   0.277   0.5985    
dense1:run_containers             1 0.000000 0.0000000   0.004   0.9489    
dense1:amalgamation               1 0.000001 0.0000015   0.279   0.5973    
dense1:gcc_optimization           1 0.000211 0.0002112  39.470 5.01e-10 ***
dense1:avx_enabled                1 0.000002 0.0000022   0.414   0.5200    
dense2:copy_on_write              1 0.000001 0.0000013   0.240   0.6240    
dense2:run_containers             1 0.000000 0.0000000   0.005   0.9432    
dense2:amalgamation               1 0.000002 0.0000015   0.285   0.5932    
dense2:gcc_optimization           1 0.000211 0.0002107  39.388 5.21e-10 ***
dense2:avx_enabled                1 0.000002 0.0000022   0.410   0.5220    
copy_on_write:run_containers      1 0.000000 0.0000000   0.009   0.9240    
copy_on_write:amalgamation        1 0.000000 0.0000001   0.010   0.9215    
copy_on_write:gcc_optimization    1 0.000000 0.0000000   0.000   0.9847    
copy_on_write:avx_enabled         1 0.000000 0.0000000   0.005   0.9428    
run_containers:amalgamation       1 0.000000 0.0000001   0.011   0.9173    
run_containers:gcc_optimization   1 0.000000 0.0000000   0.007   0.9349    
run_containers:avx_enabled        1 0.000000 0.0000000   0.002   0.9617    
amalgamation:gcc_optimization     1 0.000003 0.0000033   0.610   0.4351    
amalgamation:avx_enabled          1 0.000002 0.0000017   0.309   0.5784    
gcc_optimization:avx_enabled      1 0.000002 0.0000024   0.452   0.5018    
Residuals                       978 0.005233 0.0000054                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this first AOV, we observe that the sizes of the sets as well as their densities have a huge impact on the performance of the union. The use of the GCC optimization is also very significant.

The other variables however do not seem to have any impact.

However, we should check if the hypothesis required to perform an AOV are fulfilled: let us draw the plots.


In [4]:
layout(matrix(c(1,2,3,4),2,2))
plot(all_results_aov)


The plots are terrible. The two first ones show that the residuals are not uniformly distributed. They are more spread for large fit values than for small ones, which is a sign of non-constant variance. Also, they are not distributed around a flat line centred on zero, but rather draw a curvilinear plot: our model is not well suited.

Finally, the Q-Q plot shows that the residuals do not follow a Normal distribution.


In [5]:
print(var(all_results[all_results["large1"] == "True" & all_results["large2"] == "True",]$time))
print(var(all_results[all_results["large1"] == "False" & all_results["large2"] == "False",]$time))
print(var(all_results[all_results["dense1"] == "True" & all_results["dense2"] == "True",]$time))
print(var(all_results[all_results["dense1"] == "False" & all_results["dense2"] == "False",]$time))


[1] 4.150348e-05
[1] 2.514512e-11
[1] 5.044169e-09
[1] 4.212937e-05

This confirm the heteroscedasticity: the variance of the results is several orders of magnitude higher for large and non-dense datasets than for non-large and dense ones.

We will therefore apply the AOV by fixing these variables to a given value.


In [6]:
MEPlot(all_results_aov, abbrev=4, select=c(1, 2, 3, 4, 5, 6, 7, 8, 9), response="time")



In [7]:
MEPlot(all_results_aov, abbrev=4, select=c(1, 2, 3, 4), response="time")



In [8]:
MEPlot(all_results_aov, abbrev=4, select=c(5, 6, 7, 8, 9), response="time")



In [9]:
IAPlot(all_results_aov, abbrev=4, show.alias=FALSE, select=c(5, 6, 7, 8, 9))


Large and dense datasets


In [10]:
large_dense <- all_results[all_results["large1"]=="True" & all_results["large2"]=="True" & all_results["dense1"]=="True" & all_results["dense2"]=="True",]
large_dense_aov <- aov(time~(copy_on_write+run_containers+amalgamation+gcc_optimization+avx_enabled)^2, data=large_dense)
summary(large_dense_aov)


                                Df    Sum Sq   Mean Sq  F value   Pr(>F)    
copy_on_write                    1 0.000e+00 0.000e+00    0.123 0.727169    
run_containers                   1 3.000e-10 3.000e-10    1.040 0.312825    
amalgamation                     1 1.000e-09 1.000e-09    4.020 0.050620 .  
gcc_optimization                 1 3.207e-07 3.207e-07 1274.649  < 2e-16 ***
avx_enabled                      1 4.900e-09 4.900e-09   19.547  5.6e-05 ***
copy_on_write:run_containers     1 0.000e+00 0.000e+00    0.198 0.658179    
copy_on_write:amalgamation       1 2.000e-10 2.000e-10    0.844 0.362934    
copy_on_write:gcc_optimization   1 0.000e+00 0.000e+00    0.072 0.789461    
copy_on_write:avx_enabled        1 1.000e-10 1.000e-10    0.455 0.503307    
run_containers:amalgamation      1 0.000e+00 0.000e+00    0.166 0.685319    
run_containers:gcc_optimization  1 1.000e-10 1.000e-10    0.529 0.470744    
run_containers:avx_enabled       1 0.000e+00 0.000e+00    0.019 0.891192    
amalgamation:gcc_optimization    1 4.400e-09 4.400e-09   17.509 0.000121 ***
amalgamation:avx_enabled         1 3.100e-09 3.100e-09   12.226 0.001026 ** 
gcc_optimization:avx_enabled     1 2.300e-09 2.300e-09    9.276 0.003765 ** 
Residuals                       48 1.210e-08 3.000e-10                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In [11]:
layout(matrix(c(1,2,3,4),2,2))
plot(large_dense_aov)


The plots look more “conform” here. There is nearly no pattern for the residuals vs fitted values: the points are uniformly distributed around the x axis. The QQ-plot is also almost a line.

The summary shows that, for large and dense datasets, the factors with a significant impact are gcc_optimization and avx_enabled. Also, the impact of the amalgamation factor seems to be correlated with gcc_optimization and avx_enabled.


In [12]:
MEPlot(large_dense_aov, abbrev=4, select=c(1, 2, 3, 4, 5), response="time")



In [13]:
IAPlot(large_dense_aov, abbrev=4, show.alias=FALSE, select=c(1, 2, 3, 4, 5))


AVX strange thing

It seems that enabling AVX may lead to worse performances. It happens when GCC optimization and amalgamation are disabled, for large and dense roaring bitmaps.

Sample code:

import numpy
from utils import *
def f(avx):
    init_directory(BUILD_DIR)
    compile_exec(amalgamation=False, gcc_optimization=False, avx_enabled=avx)
    t = run(size1=2**20, universe1=2**21, size2=2**20, universe2=2**21,
            copy_on_write=False, run_containers=True)
    os.chdir('..')
    return t
avx_true = [f(True) for _ in range(5)]
avx_false = [f(False) for _ in range(5)]
print(numpy.mean(avx_true))
print(numpy.mean(avx_false))

Prints:

0.000248622
0.00019045

When GCC optimization and amalgamation are enabled, the performances with or without AVX seem to be similar (which is also disappointing).


Large and non-dense datasets


In [14]:
large_nondense <- all_results[all_results["large1"]=="True" & all_results["large2"]=="True" & all_results["dense1"]=="False" & all_results["dense2"]=="False",]
large_nondense_aov <- aov(time~(copy_on_write+run_containers+amalgamation+gcc_optimization+avx_enabled)^2, data=large_nondense)
summary(large_nondense_aov)


                                Df   Sum Sq  Mean Sq  F value   Pr(>F)    
copy_on_write                    1 0.000000 0.000000    0.227 0.635585    
run_containers                   1 0.000000 0.000000    0.214 0.645797    
amalgamation                     1 0.000024 0.000024   13.576 0.000582 ***
gcc_optimization                 1 0.003431 0.003431 1966.385  < 2e-16 ***
avx_enabled                      1 0.000035 0.000035   19.809 5.08e-05 ***
copy_on_write:run_containers     1 0.000001 0.000001    0.589 0.446396    
copy_on_write:amalgamation       1 0.000001 0.000001    0.440 0.510295    
copy_on_write:gcc_optimization   1 0.000000 0.000000    0.093 0.761424    
copy_on_write:avx_enabled        1 0.000000 0.000000    0.231 0.633038    
run_containers:amalgamation      1 0.000001 0.000001    0.569 0.454379    
run_containers:gcc_optimization  1 0.000000 0.000000    0.207 0.651143    
run_containers:avx_enabled       1 0.000000 0.000000    0.101 0.751779    
amalgamation:gcc_optimization    1 0.000055 0.000055   31.395 1.01e-06 ***
amalgamation:avx_enabled         1 0.000024 0.000024   13.694 0.000554 ***
gcc_optimization:avx_enabled     1 0.000040 0.000040   22.914 1.66e-05 ***
Residuals                       48 0.000084 0.000002                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In [15]:
layout(matrix(c(1,2,3,4),2,2))
plot(large_nondense_aov)


The plots are here better than the first one, but not perfect. The points are still not uniformly distributed on the fitted values vs residuals plot, and the QQ-plot is not a straight line.

The summary shows that again, the factors gcc_optimization, avx_enabled and amalgamation have a significant impact on the performances.


In [16]:
MEPlot(large_nondense_aov, abbrev=4, select=c(1, 2, 3, 4, 5), response="time")



In [17]:
IAPlot(large_nondense_aov, abbrev=4, show.alias=FALSE, select=c(1, 2, 3, 4, 5))


Effect and interaction plots are less surprising here. We see that enabling AVX has a positive impact on the performances (but the main performance gain comes with gcc optimizations, as expected).

Large and one-dense datasets

Here, one and only one of the datasets is dense.


In [18]:
large_onedense <- all_results[all_results["large1"]=="True" & all_results["large2"]=="True" & (all_results["dense1"]=="True" | all_results["dense2"]=="True") & (all_results["dense1"]=="False" | all_results["dense2"]=="False"),]
large_onedense_aov <- aov(time~(copy_on_write+run_containers+amalgamation+gcc_optimization+avx_enabled)^2, data=large_onedense)
summary(large_onedense_aov)


                                 Df    Sum Sq   Mean Sq  F value   Pr(>F)    
copy_on_write                     1 1.862e-05 1.862e-05 9412.393  < 2e-16 ***
run_containers                    1 6.000e-09 6.000e-09    3.107 0.080701 .  
amalgamation                      1 4.600e-08 4.600e-08   23.073 4.87e-06 ***
gcc_optimization                  1 9.100e-08 9.100e-08   46.214 5.41e-10 ***
avx_enabled                       1 3.300e-08 3.300e-08   16.781 7.96e-05 ***
copy_on_write:run_containers      1 1.000e-09 1.000e-09    0.618 0.433297    
copy_on_write:amalgamation        1 0.000e+00 0.000e+00    0.161 0.689252    
copy_on_write:gcc_optimization    1 7.000e-09 7.000e-09    3.599 0.060405 .  
copy_on_write:avx_enabled         1 0.000e+00 0.000e+00    0.206 0.650873    
run_containers:amalgamation       1 4.000e-09 4.000e-09    2.091 0.150958    
run_containers:gcc_optimization   1 1.000e-09 1.000e-09    0.672 0.414150    
run_containers:avx_enabled        1 0.000e+00 0.000e+00    0.034 0.852994    
amalgamation:gcc_optimization     1 2.800e-08 2.800e-08   14.382 0.000242 ***
amalgamation:avx_enabled          1 5.100e-08 5.100e-08   25.673 1.61e-06 ***
gcc_optimization:avx_enabled      1 2.100e-08 2.100e-08   10.405 0.001648 ** 
Residuals                       112 2.220e-07 2.000e-09                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In [19]:
layout(matrix(c(1,2,3,4),2,2))
plot(large_onedense_aov)


Again, the plots are not great, but they are still better than for the general case.

With the summary, we see that for this case, the factor copy_on_write is by far the most impactful (F value several orders of magnitude greater than the other significant factors). The other important factors are amalgamation, gcc_optimization and avx_enabled. The factor run_container has also a small impact.


In [20]:
MEPlot(large_onedense_aov, abbrev=4, select=c(1, 2, 3, 4, 5), response="time")



In [21]:
IAPlot(large_onedense_aov, abbrev=4, show.alias=FALSE, select=c(1, 2, 3, 4, 5))


The effect plot confirms that the main performance gain comes from the use of copy on write. This was expected in this case, the non dense roaring bitmap has its elements spread across much more containers than the dense roaring bitmaps. A large number of containers are not modified, therefore there is no need to copy them.

Non-large datasets


In [22]:
nonlarge <- all_results[all_results["large1"]=="False" & all_results["large2"]=="False",]
nonlarge_aov <- aov(time~(copy_on_write+run_containers+amalgamation+gcc_optimization+avx_enabled)^2, data=nonlarge)
summary(nonlarge_aov)


                                 Df    Sum Sq   Mean Sq F value  Pr(>F)   
copy_on_write                     1 3.100e-11 3.103e-11   1.322 0.25139   
run_containers                    1 3.700e-11 3.693e-11   1.573 0.21097   
amalgamation                      1 1.860e-10 1.856e-10   7.905 0.00534 **
gcc_optimization                  1 1.620e-10 1.619e-10   6.899 0.00918 **
avx_enabled                       1 1.000e-11 9.990e-12   0.426 0.51481   
copy_on_write:run_containers      1 1.600e-11 1.643e-11   0.700 0.40368   
copy_on_write:amalgamation        1 1.700e-11 1.714e-11   0.730 0.39362   
copy_on_write:gcc_optimization    1 1.000e-12 7.500e-13   0.032 0.85835   
copy_on_write:avx_enabled         1 1.800e-11 1.792e-11   0.763 0.38315   
run_containers:amalgamation       1 4.000e-12 4.360e-12   0.186 0.66706   
run_containers:gcc_optimization   1 2.400e-11 2.388e-11   1.017 0.31417   
run_containers:avx_enabled        1 5.900e-11 5.870e-11   2.501 0.11511   
amalgamation:gcc_optimization     1 1.400e-11 1.385e-11   0.590 0.44314   
amalgamation:avx_enabled          1 1.310e-10 1.309e-10   5.577 0.01900 * 
gcc_optimization:avx_enabled      1 6.900e-11 6.871e-11   2.927 0.08839 . 
Residuals                       240 5.634e-09 2.347e-11                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In [23]:
layout(matrix(c(1,2,3,4),2,2))
plot(nonlarge_aov)


The plots are great, except for two outliers. The points are uniformly distributed around the x axis on the two first plots, and the QQ-plot is a straight line.

The summary shows that the factors gcc_optimization and avx_enabled have a significant impact on the performances.


In [24]:
MEPlot(nonlarge_aov, abbrev=4, select=c(1, 2, 3, 4, 5), response="time")


Only amalgamation and GCC optimization have a noticeable impact on the performances (other optimizations have a low F-value, according to the above summary). Strangely, amalgamation seem to have a negative impact here.

Conclusion

A first AOV on all the factors showed that our data does not fulfill the hypothesis: the variance of the time is not constant.

To overcome this, we fixed the “problematic” variables, i.e. the sizes and the densities of the roaring bitmaps.

Without surprise, the factor gcc_optimization has always a very significant positive impact on the performances.

For large roaring bitmaps, the factor avx_enabled is also significant, but not always in a positive way: using AVX decreases the performances for dense datasets when no gcc optimization is used. No explanation has been found yet. The factor amalgamation is important if at least one of the roaring bitmaps is non dense, and the factor copy_on_write is very important if one and only one of them is dense.

For small roaring bitmaps, the only important factors are gcc_optimization and amalgamation, but the later has a negative impact.