This application is used to build prediction models for predicting security bug reports. Included here are bug reports from five different projects, code, experiments, and results. This document provides guidance on how to use the application on the command line. We start with some information about the bug reports and then we procede as follows:
Users can clone the farsec repo and run the examples in order, using the commands in bold. The user must also create an output directory in the same location as farsec-v1.jar. All data and results will be sent to the output directory.
Project | Domain | Start ID | End ID | Start ID Date | End ID Date | BRs | SBRs | SBRs(%) |
---|---|---|---|---|---|---|---|---|
Chromium | Web browser called Chrome. | 2 | 46313 | Aug 30 2008 | Jun 11 2010 | 44885 | 191 | 0.4 |
Wicket | Component-based web application framework for the Java programming. | 12 | 5753 | Oct 20 2006 | Nov 9 2014 | 1000 | 10 | 1 |
Ambari | Hadoop management web UI backed by its RESTful APIs. | 12 | 6793 | Sep 26 2011 | Aug 8 2014 | 1000 | 29 | 3 |
Camel | A rule-based routing and mediation engine. | 72 | 6767 | Jul 8 2007 | Sep 18 2013 | 1000 | 32 | 3 |
Derby | A relational database management system. | 5 | 6742 | Sep 28 2004 | Sep 17 2014 | 1000 | 88 | 9 |
For each project, data is partitioned into train and test sets. These are further partitioned into sbrs and nsbrs. Each bug report is converted into a file and stored in the approporiate directory (e.g -sbr-new, -nsbr-new, -sbr-old and -nsbr-old). An example of the result is shown in /resources/data1/wicket/
Given bug reports for each project we find security related keywords. We use the keywords to generate train and test sets with the bug reports. We generate these matrices for two types of prediction experiments. The first is within predition (WPP) the second is transfer prediction (TPP).
Options:
-o output directory name (mkdir)
-p project name
-n number of keywords/features
Run:
Options:
Run:
Filter WPP and TPP train data sets. These are denoted as WPPx and TPPx respectively.
Options:
Run:
We use tf-idf as a proxy for popularity of specific keywords present is different sources. These sources are security bug reports and non security bug reports before and after filtering. The result is saved in a csv file.
Options:
-o output directory name (mkdir)
-p project name
-n number of keywords/features
Run:
We build prediction models using the following machine learning algorithms:
Bug reports in the test set are predicted as security or non security.
Options:
Run:
In [ ]: