Filtering and Ranking Security Bug Reports

This application is used to build prediction models for predicting security bug reports. Included here are bug reports from five different projects, code, experiments, and results. This document provides guidance on how to use the application on the command line. We start with some information about the bug reports and then we procede as follows:

  • the generation of data matrices for experiments;
  • calculation of tfidf as a measure of the popularity of security related keywords;
  • prediction bug reports as security or non security reports and;
  • ranking bug reports.

Users can clone the farsec repo and run the examples in order, using the commands in bold. The user must also create an output directory in the same location as farsec-v1.jar. All data and results will be sent to the output directory.

About Bug Reports

Bug Report Sources

Characteristics About Bug Reports

Project Domain Start ID End ID Start ID Date End ID Date BRs SBRs SBRs(%)
Chromium Web browser called Chrome. 2 46313 Aug 30 2008 Jun 11 2010 44885 191 0.4
Wicket Component-based web application framework for the Java programming. 12 5753 Oct 20 2006 Nov 9 2014 1000 10 1
Ambari Hadoop management web UI backed by its RESTful APIs. 12 6793 Sep 26 2011 Aug 8 2014 1000 29 3
Camel A rule-based routing and mediation engine. 72 6767 Jul 8 2007 Sep 18 2013 1000 32 3
Derby A relational database management system. 5 6742 Sep 28 2004 Sep 17 2014 1000 88 9

Bug Reports

For each project, data is partitioned into train and test sets. These are further partitioned into sbrs and nsbrs. Each bug report is converted into a file and stored in the approporiate directory (e.g -sbr-new, -nsbr-new, -sbr-old and -nsbr-old). An example of the result is shown in /resources/data1/wicket/

Data Matrices

Generate Data Matrices for Experiments

Given bug reports for each project we find security related keywords. We use the keywords to generate train and test sets with the bug reports. We generate these matrices for two types of prediction experiments. The first is within predition (WPP) the second is transfer prediction (TPP).

Example: WPP for wicket

Options:

  • -o output directory name (mkdir)

  • -p project name

  • -n number of keywords/features

Run:

  • java -jar farsec-v1.jar -o wicket-data1 -k data1 -p wicket -n 100 --wpp

Example: TPP for wicket (target) with ambari (source)

Options:

  • -o output directory name (mkdir)
  • -c source project name
  • -p project name
  • -n number of keywords/features

Run:

  • java -jar farsec-v1.jar -o wicket-data1 -k data1 -c ambari -p wicket -n 100 --tpp

Generate Data Matrices for Experiments with Filtering

Filter WPP and TPP train data sets. These are denoted as WPPx and TPPx respectively.

Example: WPPx for wicket

Options:

  • -o output directory name (mkdir)
  • -c source project name
  • -p project name
  • -n number of keywords/features

Run:

  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket --wppx
  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket --tppx

Calculate Tf-idf

Generate tfidf Files

We use tf-idf as a proxy for popularity of specific keywords present is different sources. These sources are security bug reports and non security bug reports before and after filtering. The result is saved in a csv file.

Example for wicket

Options:

  • -o output directory name (mkdir)

  • -p project name

  • -n number of keywords/features

Run:

  • java -jar farsec-v1.jar -o wicket-data1 -p wicket -n 100 --tfidf

Make Predictions

We build prediction models using the following machine learning algorithms:

  • Logistic Regression
  • Naive Bayes
  • Random Forest
  • K-Nearest Neighbor
  • Multilayer Perceptron

Bug reports in the test set are predicted as security or non security.

Options:

  • -o output directory name (mkdir)
  • -c source project name
  • -p project name
  • -n number of keywords/features

Run:

  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket -s wpp --predict
  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket -s wppx --predict
  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket -s tpp --predict
  • java -jar farsec-v1.jar -o wicket-data1 -c ambari -p wicket -s tppx --predict

In [ ]: