Db2 already has a variety of Statistical functions built in. In Db2 11.1, a number of new functions have been added including:
N refers to the size of the population and in many cases, we only have a sample, not the entire population of values. In this case, the formula needs to be adjusted to account for the sampling.
Set up the connection to the database.
In [ ]:
    
%run db2.ipynb
    
We populate the database with the EMPLOYEE and DEPARTMENT tables so that we can run the various examples.
In [ ]:
    
%sql -sampledata
    
In [ ]:
    
%%sql
SELECT COVARIANCE_SAMP(SALARY, BONUS) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'A00'
    
In [ ]:
    
%%sql
SELECT STDDEV_SAMP(SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'A00'
    
In [ ]:
    
%%sql
SELECT VARIANCE_SAMP(SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'A00'
    
In [ ]:
    
%%sql
SELECT MEDIAN(SALARY) AS MEDIAN, AVG(SALARY) AS AVERAGE 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'E21'
    
In [ ]:
    
%%sql
SELECT CUME_DIST(47000) WITHIN GROUP (ORDER BY SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'A00'
    
In [ ]:
    
%%sql
SELECT PERCENT_RANK(47000) WITHIN GROUP (ORDER BY SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'A00'
    
In [ ]:
    
%%sql
SELECT PERCENTILE_DISC(0.75) WITHIN GROUP (ORDER BY SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'E21'
    
In [ ]:
    
%%sql
SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY SALARY) 
  FROM EMPLOYEE 
WHERE WORKDEPT = 'E21'
    
In [ ]:
    
%%sql
SELECT EMPNO, SALARY, WIDTH_BUCKET(SALARY, 35000, 100000, 13) 
  FROM EMPLOYEE 
ORDER BY EMPNO
    
We can plot this information by adding some more details to the bucket output.
In [ ]:
    
%%sql -a
WITH BUCKETS(EMPNO, SALARY, BNO) AS 
  ( 
  SELECT EMPNO, SALARY, 
    WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET 
  FROM EMPLOYEE ORDER BY EMPNO 
  ) 
SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS 
GROUP BY BNO 
ORDER BY BNO ASC
    
And here is a plot of the data to make sense of the histogram.
In [ ]:
    
%%sql -pb
WITH BUCKETS(EMPNO, SALARY, BNO) AS 
  ( 
  SELECT EMPNO, SALARY, 
    WIDTH_BUCKET(SALARY, 35000, 100000, 9) AS BUCKET 
  FROM EMPLOYEE ORDER BY EMPNO 
  ) 
SELECT BNO, COUNT(*) AS COUNT FROM BUCKETS 
GROUP BY BNO 
ORDER BY BNO ASC