Introduction

Let's start our exploration of how CVE Details create types by examining this sample table of one of CVE Details 13 types. Specifically, this table refers to all entries labeled SQL Injection in 2016.


In [4]:
#using panda 
import pandas as pd
import csv
import numpy as np
import math

#We specify the cwe_id column type is str, otherwise pandas will infer it as a float adding a misleading decimal.
cved_df = pd.read_csv("cve_details_sql_injection_2016.csv",dtype={'cwe_id': str})
cved_df['month'] = [d[5:7] for d in cved_df['published_date']]
cved_df
#np.unique(cved_df['vulnerability_type'])


Out[4]:
cve_id cwe_id n_exploits vulnerability_type published_date updated_date description score gained_access_level access complexity authentication conf integ avail month
0 CVE-2016-1000217 89 NaN Sql 2016-10-06 2016-12-22 Zotpress plugin for WordPress SQLi in zp_get_a... 7.5 None Remote Low Not required Partial Partial Partial 10
1 CVE-2016-1000125 89 NaN Sql 2016-10-06 2017-01-19 Unauthenticated SQL Injection in Huge-IT Catal... 7.5 None Remote Low Not required Partial Partial Partial 10
2 CVE-2016-1000124 89 NaN Sql 2016-10-06 2016-11-28 Unauthenticated SQL Injection in Huge-IT Portf... 7.5 None Remote Low Not required Partial Partial Partial 10
3 CVE-2016-1000123 89 NaN Sql 2016-10-06 2016-12-22 Unauthenticated SQL Injection in Huge-IT Video... 7.5 None Remote Low Not required Partial Partial Partial 10
4 CVE-2016-1000122 89 NaN Sql XSS 2016-10-27 2016-12-22 XSS and SQLi in Huge IT Joomla Slider v1.0.9 e... 6.5 None Remote Low Single system Partial Partial Partial 10
5 CVE-2016-1000120 89 NaN Sql XSS 2016-10-27 2016-12-22 SQLi and XSS in Huge IT catalog extension v1.0... 6.5 None Remote Low Single system Partial Partial Partial 10
6 CVE-2016-1000116 79 NaN Sql XSS 2016-10-21 2016-11-28 Huge-IT Portfolio Gallery manager v1.1.5 SQL I... 6.5 None Remote Low Single system Partial Partial Partial 10
7 CVE-2016-1000115 79 NaN Sql XSS 2016-10-21 2016-12-22 Huge-IT Portfolio Gallery manager v1.1.5 SQL I... 6.5 None Remote Low Single system Partial Partial Partial 10
8 CVE-2016-1000113 89 NaN Sql XSS 2016-10-06 2016-11-28 XSS and SQLi in huge IT gallery v1.1.5 for Joomla 7.5 None Remote Low Not required Partial Partial Partial 10
9 CVE-2016-1000000 89 NaN Sql 2016-10-06 2017-01-17 Ipswitch WhatsUp Gold 16.4.1 WrFreeFormText.as... 6.5 None Remote Low Single system Partial Partial Partial 10
10 CVE-2016-9864 89 NaN Sql 2016-12-10 2016-12-23 An issue was discovered in phpMyAdmin. With a ... 6.0 None Remote Medium Single system Partial Partial Partial 12
11 CVE-2016-9481 89 NaN Sql 2016-11-29 2016-12-02 In framework/modules/core/controllers/expComme... 7.5 None Remote Low Not required Partial Partial Partial 11
12 CVE-2016-9288 89 NaN Sql 2016-11-11 2016-11-29 In framework/modules/navigation/controllers/na... 7.5 None Remote Low Not required Partial Partial Partial 11
13 CVE-2016-9287 89 NaN Sql 2016-11-15 2016-11-29 In /framework/modules/notfound/controllers/not... 7.5 None Remote Low Not required Partial Partial Partial 11
14 CVE-2016-9283 89 NaN Sql 2016-11-11 2016-11-29 SQL Injection in framework/core/subsystems/exp... 5.0 None Remote Low Not required Partial None None 11
15 CVE-2016-9282 89 NaN Sql 2016-11-11 2016-11-29 SQL Injection in framework/modules/search/cont... 5.0 None Remote Low Not required Partial None None 11
16 CVE-2016-9272 89 NaN DoS Sql 2016-11-11 2016-11-29 A Blind SQL Injection Vulnerability in Exponen... 6.4 None Remote Low Not required Partial None Partial 11
17 CVE-2016-9242 89 NaN Exec Code Sql 2016-11-07 2016-11-29 Multiple SQL injection vulnerabilities in the ... 6.5 None Remote Low Single system Partial Partial Partial 11
18 CVE-2016-9184 89 NaN Sql 2016-11-04 2016-11-29 In /framework/modules/core/controllers/expHTML... 5.0 None Remote Low Not required Partial None None 11
19 CVE-2016-9183 200 NaN Sql Bypass +Info 2016-11-04 2016-11-29 In /framework/modules/ecommerce/controllers/or... 5.0 None Remote Low Not required Partial None None 11
20 CVE-2016-9135 89 NaN Sql 2016-11-03 2016-11-29 Exponent CMS 2.3.9 suffers from a SQL injectio... 5.0 None Remote Low Not required Partial None None 11
21 CVE-2016-9134 89 NaN Sql 2016-11-03 2016-11-29 Exponent CMS 2.3.9 suffers from a SQL injectio... 5.0 None Remote Low Not required Partial None None 11
22 CVE-2016-8908 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the "Site Brows... 6.5 None Remote Low Single system Partial Partial Partial 11
23 CVE-2016-8907 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the "Content Ty... 6.5 None Remote Low Single system Partial Partial Partial 11
24 CVE-2016-8906 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the "Site Brows... 6.5 None Remote Low Single system Partial Partial Partial 11
25 CVE-2016-8905 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the JSONTags se... 6.5 None Remote Low Single system Partial Partial Partial 11
26 CVE-2016-8904 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the "Site Brows... 6.5 None Remote Low Single system Partial Partial Partial 11
27 CVE-2016-8903 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the "Site Brows... 6.5 None Remote Low Single system Partial Partial Partial 11
28 CVE-2016-8902 89 NaN Exec Code Sql 2016-11-14 2016-11-29 SQL injection vulnerability in the categoriesS... 7.5 None Remote Low Not required Partial Partial Partial 11
29 CVE-2016-8582 89 NaN Sql 2016-10-28 2016-11-28 A vulnerability exists in gauge.php of AlienVa... 7.5 None Remote Low Not required Partial Partial Partial 10
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
256 CVE-2016-0488 NaN NaN Dir. Trav. Bypass 2016-01-20 2016-12-22 Unspecified vulnerability in the Oracle Applic... 6.4 None Remote Low Not required Partial Partial None 01
257 CVE-2016-0487 NaN NaN Dir. Trav. Bypass 2016-01-20 2016-12-22 Unspecified vulnerability in the Oracle Applic... 6.4 None Remote Low Not required Partial Partial None 01
258 CVE-2016-0486 NaN NaN Dir. Trav. 2016-01-20 2016-12-22 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
259 CVE-2016-0485 NaN NaN Dir. Trav. 2016-01-20 2016-12-22 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
260 CVE-2016-0484 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
261 CVE-2016-0482 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
262 CVE-2016-0481 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
263 CVE-2016-0480 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
264 CVE-2016-0478 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
265 CVE-2016-0477 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
266 CVE-2016-0476 NaN NaN Dir. Trav. 2016-01-20 2016-12-07 Unspecified vulnerability in the Oracle Applic... 5.0 None Remote Low Not required Partial None None 01
267 CVE-2016-0298 200 NaN Dir. Trav. +Info 2016-06-28 2016-06-29 Directory traversal vulnerability in IBM Secur... 4.0 None Remote Low Single system Partial None None 06
268 CVE-2015-1000006 22 NaN Dir. Trav. 2016-10-06 2016-10-26 Remote file download vulnerability in recent-b... 5.0 None Remote Low Not required Partial None None 10
269 CVE-2015-1000005 22 NaN Dir. Trav. 2016-10-06 2016-10-26 Remote file download vulnerability in candidat... 5.0 None Remote Low Not required Partial None None 10
270 CVE-2015-8799 22 NaN Dir. Trav. 2016-06-08 2016-06-10 Directory traversal vulnerability in the Manag... 7.1 None Remote High Single system Complete Complete Complete 06
271 CVE-2015-8798 22 NaN Exec Code Dir. Trav. 2016-06-08 2016-06-10 Directory traversal vulnerability in the Manag... 7.7 None Local Network Low Single system Complete Complete Complete 06
272 CVE-2015-8794 22 NaN Dir. Trav. 2016-01-29 2016-02-25 Absolute path traversal vulnerability in progr... 4.0 None Remote Low Single system Partial None None 01
273 CVE-2015-8770 22 NaN Exec Code Dir. Trav. 2016-01-29 2016-12-05 Directory traversal vulnerability in the set_s... 6.0 None Remote Medium Single system Partial Partial Partial 01
274 CVE-2015-6833 22 NaN Dir. Trav. 2016-01-19 2016-11-29 Directory traversal vulnerability in the PharD... 5.0 None Remote Low Not required None Partial None 01
275 CVE-2015-5471 22 NaN Dir. Trav. 2016-01-12 2016-11-28 Absolute path traversal vulnerability in inclu... 5.0 None Remote Low Not required Partial None None 01
276 CVE-2015-5345 22 NaN Dir. Trav. 2016-02-24 2016-12-05 The Mapper component in Apache Tomcat 6.x befo... 5.0 None Remote Low Not required Partial None None 02
277 CVE-2015-5313 22 NaN Dir. Trav. 2016-04-11 2016-11-28 Directory traversal vulnerability in the virSt... 1.9 None Local Medium Not required None Partial None 04
278 CVE-2015-5174 22 NaN Dir. Trav. Bypass 2016-02-24 2016-12-05 Directory traversal vulnerability in RequestUt... 4.0 None Remote Low Single system Partial None None 02
279 CVE-2015-4988 22 NaN Dir. Trav. 2016-01-18 2016-01-21 Directory traversal vulnerability in the repla... 7.8 None Remote Low Not required Complete None None 01
280 CVE-2015-4703 22 NaN Dir. Trav. 2016-01-12 2016-06-27 Absolute path traversal vulnerability in mysql... 5.0 None Remote Low Not required Partial None None 01
281 CVE-2015-4694 22 NaN Dir. Trav. 2016-01-08 2016-11-28 Directory traversal vulnerability in download.... 5.0 None Remote Low Not required Partial None None 01
282 CVE-2015-2007 22 NaN Dir. Trav. 2016-01-02 2016-01-07 Directory traversal vulnerability in IBM Secur... 4.0 None Remote Low Single system Partial None None 01
283 CVE-2015-1977 200 NaN Dir. Trav. +Info 2016-07-15 2016-07-18 Directory traversal vulnerability in the Web A... 5.0 None Remote Low Not required Partial None None 07
284 CVE-2014-9767 22 NaN Dir. Trav. 2016-05-21 2016-11-30 Directory traversal vulnerability in the ZipAr... 4.3 None Remote Medium Not required None Partial None 05
285 CVE-2013-7448 22 NaN Dir. Trav. 2016-02-23 2016-03-10 Directory traversal vulnerability in wiki.c in... 5.0 None Remote Low Not required Partial None None 02

286 rows × 16 columns

Notice that the column vulnerability_type not only includes sql injection. It may also cite other identified types out of the 13. For instance, row 4 value is SQL XSS, indicating the entry is both of SQL and XSS types.

Important: Entries labeled with multiple types WILL appear, accordingly, on the tables. A combination of tables must be done carefully when collecting textual descriptions, to avoid redundance!

Analysis

To begin the analysis, let's consider how cwe ids, which by definition group vulnerabilities, are distributed across the type for this year and SQL Injection. Notice a cwe_id may or not occur, but the type is always defined. cve_id's, always occur, hence defining the upper bound a vulnerability type could had.


In [5]:
vulnerability_type_histogram = cved_df.groupby(by=['vulnerability_type'])['cwe_id','cve_id'].count()
vulnerability_type_histogram


Out[5]:
cwe_id cve_id
vulnerability_type
+Priv Dir. Trav. 0 1
Dir. Trav. 69 80
Dir. Trav. +Info 4 4
Dir. Trav. Bypass 3 6
Dir. Trav. File Inclusion 3 3
DoS Sql 2 2
Exec Code Dir. Trav. 4 4
Exec Code Sql 118 118
Exec Code Sql +Info 2 2
Sql 52 52
Sql +Info 2 2
Sql Bypass +Info 2 2
Sql XSS 10 10

We can note some combinations of type occur much more frequently than others.

Let's explore further the vulnerability types proposed by CVE Details, by not only counting the number of cwe id's per vulnerability type, but also what are the cwe_id'per type. Out of curiosity, let's also include the number of exploits that were reported.


In [7]:
vulnerability_type_histogram = cved_df.groupby(by=['vulnerability_type','cwe_id'])['cve_id','n_exploits'].count()
print(vulnerability_type_histogram)
vulnerability_list = np.unique(cved_df['vulnerability_type'])
vulnerability_by_month = cved_df.groupby(by=['vulnerability_type','month'])['cve_id'].count()


                                  cve_id  n_exploits
vulnerability_type        cwe_id                    
Dir. Trav.                22          67           0
                          434          1           0
                          77           1           0
Dir. Trav. +Info          200          3           0
                          22           1           0
Dir. Trav. Bypass         22           3           0
Dir. Trav. File Inclusion 22           3           0
DoS Sql                   89           2           0
Exec Code Dir. Trav.      22           3           0
                          77           1           0
Exec Code Sql             89         118           0
Exec Code Sql +Info       200          2           0
Sql                       264          2           0
                          89          50           0
Sql +Info                 89           2           0
Sql Bypass +Info          200          2           0
Sql XSS                   79           4           0
                          89           6           0

A pattern emerges in the construction of the types: For vulnerability types with a higher number of cwe entries, this higher number is led by a single cwe id. This is the case for 3 vulnerability types in the table above: Dir.Trav. being led by cwe_id 22, Exec Code Sql being led by cwe_id 89, and vulnerability type SQL on cwe_id 89.

Additionally, we should remember that the column vulnerability type can be a combination of 1 or more types of the 13 proposed by CVE Details. Specifically, Exec Code Sql actually refers to the type Code Execution and the type SQL.

We can also observe just the distribution of cwe_id's reported for SQL table for 2016.


In [8]:
vulnerability_histogram = cved_df.groupby(by=['cwe_id'])['cve_id'].count()
vulnerability_histogram


Out[8]:
cwe_id
200      7
22      77
264      2
434      1
77       2
79       4
89     178
Name: cve_id, dtype: int64

Visualization

Vulnerability type histogram


In [9]:
#imports for histogram
import numpy as np
import pandas as pd
from bokeh.plotting import figure, show
from bokeh.models import Range1d
from bokeh.io import output_notebook
from bokeh.charts import Bar
import matplotlib.pyplot as plot
from datetime import datetime
output_notebook()


Loading BokehJS ...

In [10]:
#creating a histogram for vulnerability types by creating a dictionary
vulnerability_type_histogram = cved_df.groupby(by=['vulnerability_type'])['cve_id'].count()
data = {}
data['Entries'] = vulnerability_type_histogram
#saving in dictionary for sorting and visualising
df_data = pd.DataFrame(data).sort_values(by='Entries', ascending=True)
series = df_data.loc[:,'Entries']

p = figure(width=800, y_range=series.index.tolist(), title="Histogram of Vulnerability Types from CVE Details")

p.xaxis.axis_label = 'Frequency'
p.xaxis.axis_label_text_font_size = '10pt'
p.xaxis.major_label_text_font_size = '8pt'

p.yaxis.axis_label = 'Vulnerability Type Combinations of the 13 Types'
p.yaxis.axis_label_text_font_size = '10pt'
p.yaxis.major_label_text_font_size = '8pt'

j = 1
for k,v in series.iteritems():
  
  #Print fields, values, orders
  #print (k,v,j) 
  p.rect(x=v/2, y=j, width=abs(v), height=0.4,
    width_units="data", height_units="data")
  j += 1

The histogram below represents vulnerability types as mentioned in the Cve Details database. The vulnerability types are explained below:

  • Dir. Trav. stands for Directory Traversal
  • Dir. Trav. Bypass stands for Directory Traversal Bypass
  • Dir. Trav. File Inclusion stands for Directory Traversal File Inclusion
  • DoS Sql stands for Denial of Service attack using Structured Query Language
  • Exec Code Dir. Trav. stands for Execution of arbitrary code causing a Directory Traversal vulnerability
  • Exec code Sql stands for Execution of arbitrary code causing a Structured Query Language vulnerability
  • Exec Code Sql +Info stands for Execution of arbitrary code causing a Structured Query Language vulnerability or access to sensitive information
  • Sql stands for Structured Query language vulnerability
  • Sql +Info stands for a Structured Query Language vulnerability or access to sensitive information
  • Sql Bypass +Info stands for a Structured Query Language Bypass vulnerability or access to sensitive information
  • Sql XSS stands for Cross-site scripting(XSS) vulnerability in SQL

In [11]:
show(p)


We created this histogram to gain insights into the the number of occurances of each of the vulnerability types. On analysis we can see that Exec. code Sql is the most frequent type of attack, followed by Dir. Trav. and Sql types which qualify to the three most frequent vulnerability types.

CWE ID count histogram


In [12]:
#creating a histogram for cwe ID types by creating a dictionary
data = {}
data['Entries'] = vulnerability_histogram
#saving in dictionary for sorting and visualising
df_data = pd.DataFrame(data).sort_values(by='Entries', ascending=True)
series = df_data.loc[:,'Entries']

p = figure(width=800, y_range=series.index.tolist(), title="Histogram of CWE IDs from CVE Details")

p.xaxis.axis_label = 'Frequency'
p.xaxis.axis_label_text_font_size = '10pt'
p.xaxis.major_label_text_font_size = '8pt'

p.yaxis.axis_label = 'CWE ID'
p.yaxis.axis_label_text_font_size = '10pt'
p.yaxis.major_label_text_font_size = '8pt'

j = 1
for k,v in series.iteritems():
  
  #Print fields, values, orders
  #print (k,v,j) 
  p.rect(x=v/2, y=j, width=abs(v), height=0.4,
    width_units="data", height_units="data")
  j += 1
show(p)


The histogram above shows the frequency of CWE IDs of the CVE Details database. The most frequent CWE IDs are 89 and 22 which account for more than 90% of the entries.


In [14]:
color_map = {
'+Priv Dir. Trav.': 'red',
'Dir. Trav.': 'green',
'Dir. Trav. +Info': 'yellow',
'Dir. Trav. Bypass': 'violet',
'Dir. Trav. File Inclusion': 'indigo',
'DoS Sql': 'brown',
'Exec Code Dir. Trav.': 'black',
'Exec Code Sql': 'blue',
'Exec Code Sql +Info': 'orange',
'Sql': 'olive',
'Sql +Info' : 'navy',
'Sql Bypass +Info' : 'firebrick',
'Sql XSS' : '#F4A582'
}

def create_multi_line(vul):
    map_data = {}
    for v in vul.items():
        tuple_0 = v[0]
        if tuple_0[0] in map_data:
            map_data[tuple_0[0]].append((tuple_0[1],v[1]))
        else:
            map_data[tuple_0[0]] =[]
            map_data[tuple_0[0]].append((tuple_0[1],v[1]))
    vul_plot = plot.subplot(111)
    for k,v in map_data.items():
        vul_plot.plot([int(i[0]) for i in v],[int(j[1]) for j in v] , label=k)
        vul_plot.set_ylabel('count')
        vul_plot.set_xlabel('month')

    vul_plot.legend(bbox_to_anchor=(1.05, 1),loc=2, borderaxespad=0.)
    vul_plot.set_title("Count of Vulnerability type per month for the year 2016")
    vul_plot.set_autoscaley_on(False)
    vul_plot.set_ylim([0,25])
    vul_plot.set_autoscalex_on(False)
    vul_plot.set_xlim([1,12])
    vul_plot.xaxis.set_ticks([1,2,3,4,5,6,7,8,9,10,11,12])
    plot.show()


#We specify the cwe_id column type is str, otherwise pandas will infer it as a float adding a misleading decimal.
cved_df = pd.read_csv("cve_details_sql_injection_2016.csv")
cved_df['month'] = [d[5:7] for d in cved_df['published_date']]
vulnerability_type_histogram = cved_df.groupby(by=['vulnerability_type','cwe_id'])['cve_id','n_exploits'].count()
vulnerability_type_histogram
data = cved_df.groupby(by=['vulnerability_type','month'])['cve_id'].count()


create_multi_line(data)