pandas-profiling Epilepsy QLD Donor Information


In [1]:
!pip install pandas-profiling


Requirement already satisfied: pandas-profiling in /Users/monkee/anaconda3/lib/python3.6/site-packages
Requirement already satisfied: six>=1.9 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from pandas-profiling)
Requirement already satisfied: matplotlib>=1.4 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from pandas-profiling)
Requirement already satisfied: pandas>=0.19 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from pandas-profiling)
Requirement already satisfied: jinja2>=2.8 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from pandas-profiling)
Requirement already satisfied: numpy>=1.7.1 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from matplotlib>=1.4->pandas-profiling)
Requirement already satisfied: python-dateutil in /Users/monkee/anaconda3/lib/python3.6/site-packages (from matplotlib>=1.4->pandas-profiling)
Requirement already satisfied: pytz in /Users/monkee/anaconda3/lib/python3.6/site-packages (from matplotlib>=1.4->pandas-profiling)
Requirement already satisfied: cycler>=0.10 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from matplotlib>=1.4->pandas-profiling)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=1.5.6 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from matplotlib>=1.4->pandas-profiling)
Requirement already satisfied: MarkupSafe>=0.23 in /Users/monkee/anaconda3/lib/python3.6/site-packages (from jinja2>=2.8->pandas-profiling)

Import libraries


In [2]:
from IPython.core.debugger import set_trace
import pandas as pd
import pandas_profiling

Load and prepare dataset


In [3]:
df=pd.read_csv("donor_information.csv", parse_dates=['Aquisition Date', 'Dob'], encoding='UTF-8')

Inline report without saving object


In [4]:
pandas_profiling.ProfileReport(df)


Out[4]:

Overview

Dataset info

Number of variables 14
Number of observations 23389
Total Missing (%) 12.6%
Total size in memory 2.5 MiB
Average record size in memory 112.0 B

Variables types

Numeric 1
Categorical 13
Date 0
Text (Unique) 0
Rejected 0

Warnings

  • Aquisition Date has a high cardinality: 2925 distinct values Warning
  • Country has 23195 / 99.2% missing values Missing
  • Dob has a high cardinality: 1727 distinct values Warning
  • Extra Codes has 15291 / 65.4% missing values Missing
  • Extra Codes has a high cardinality: 678 distinct values Warning
  • Postcode has 331 / 1.4% missing values Missing
  • Postcode has a high cardinality: 1021 distinct values Warning
  • Sex has 2184 / 9.3% missing values Missing
  • Suburb has a high cardinality: 2069 distinct values Warning

Variables

Aquisition Date
Categorical

Distinct count 2925
Unique (%) 12.5%
Missing (%) 0.0%
Missing (n) 0
5/26/17
 
526
2/7/13
 
159
10/3/96
 
141
Other values (2922)
22563
Value Count Frequency (%)  
5/26/17 526 2.2%
 
2/7/13 159 0.7%
 
10/3/96 141 0.6%
 
10/4/96 124 0.5%
 
9/12/13 110 0.5%
 
9/9/13 106 0.5%
 
9/22/08 103 0.4%
 
8/16/11 98 0.4%
 
9/25/08 97 0.4%
 
6/14/11 93 0.4%
 
Other values (2915) 21832 93.3%
 

Country
Categorical

Distinct count 26
Unique (%) 13.4%
Missing (%) 99.2%
Missing (n) 23195
UNITED KINGDOM
 
75
NEW ZEALAND
 
42
ANONYMOUS COUNTRY
 
12
Other values (22)
 
65
(Missing)
23195
Value Count Frequency (%)  
UNITED KINGDOM 75 0.3%
 
NEW ZEALAND 42 0.2%
 
ANONYMOUS COUNTRY 12 0.1%
 
NETHERLANDS 11 0.0%
 
UNITED STATES 10 0.0%
 
SINGAPORE 7 0.0%
 
UNITED STATES OF AMERICA 5 0.0%
 
AU 5 0.0%
 
CANADA 4 0.0%
 
IRELAND 3 0.0%
 
Other values (15) 20 0.1%
 
(Missing) 23195 99.2%
 

Dob
Categorical

Distinct count 1727
Unique (%) 7.4%
Missing (%) 0.0%
Missing (n) 0
00/00/0000
21479
1/1/99
 
12
1/1/63
 
8
Other values (1724)
 
1890
Value Count Frequency (%)  
00/00/0000 21479 91.8%
 
1/1/99 12 0.1%
 
1/1/63 8 0.0%
 
1/1/96 8 0.0%
 
1/1/38 7 0.0%
 
1/1/03 7 0.0%
 
1/1/98 7 0.0%
 
1/1/02 7 0.0%
 
1/1/00 7 0.0%
 
1/1/93 5 0.0%
 
Other values (1717) 1842 7.9%
 

Donor Number
Numeric

Distinct count 23389
Unique (%) 100.0%
Missing (%) 0.0%
Missing (n) 0
Infinite (%) 0.0%
Infinite (n) 0
Mean 43778
Minimum 1009
Maximum 63921
Zeros (%) 0.0%

Quantile statistics

Minimum 1009
5-th percentile 9302.4
Q1 30190
Median 51581
Q3 57925
95-th percentile 62736
Maximum 63921
Range 62912
Interquartile range 27735

Descriptive statistics

Standard deviation 18070
Coef of variation 0.41277
Kurtosis -0.61181
Mean 43778
MAD 15359
Skewness -0.87179
Sum 1023916814
Variance 326530000
Memory size 182.8 KiB
Value Count Frequency (%)  
61715 1 0.0%
 
12915 1 0.0%
 
6774 1 0.0%
 
56824 1 0.0%
 
60024 1 0.0%
 
57977 1 0.0%
 
62075 1 0.0%
 
51836 1 0.0%
 
49789 1 0.0%
 
55934 1 0.0%
 
Other values (23379) 23379 100.0%
 

Minimum 5 values

Value Count Frequency (%)  
1009 1 0.0%
 
1015 1 0.0%
 
1027 1 0.0%
 
1029 1 0.0%
 
1030 1 0.0%
 

Maximum 5 values

Value Count Frequency (%)  
63917 1 0.0%
 
63918 1 0.0%
 
63919 1 0.0%
 
63920 1 0.0%
 
63921 1 0.0%
 

Donor Source
Categorical

Distinct count 43
Unique (%) 0.2%
Missing (%) 0.1%
Missing (n) 30
D2D
5151
ON-LINE
3565
SERV
3512
Other values (39)
11131
Value Count Frequency (%)  
D2D 5151 22.0%
 
ON-LINE 3565 15.2%
 
SERV 3512 15.0%
 
ACQ 1744 7.5%
 
EQI 1577 6.7%
 
3RDPARTY 970 4.1%
 
PURPLE 949 4.1%
 
SEMINAR 826 3.5%
 
UNSOL 555 2.4%
 
MEMPRE10 530 2.3%
 
Other values (32) 3980 17.0%
 

Donor Type
Categorical

Distinct count 9
Unique (%) 0.0%
Missing (%) 0.2%
Missing (n) 40
IND
18976
CORP
 
2662
SCHOOL
 
613
Other values (5)
 
1098
Value Count Frequency (%)  
IND 18976 81.1%
 
CORP 2662 11.4%
 
SCHOOL 613 2.6%
 
SERVCLUB 482 2.1%
 
COMMORG 424 1.8%
 
HOSP 122 0.5%
 
GOVTDEPT 55 0.2%
 
TRUST 15 0.1%
 
(Missing) 40 0.2%
 

Extra Codes
Categorical

Distinct count 678
Unique (%) 8.4%
Missing (%) 65.4%
Missing (n) 15291
NO ENEWS
 
915
PWE
 
864
FME
 
705
Other values (674)
5614
(Missing)
15291
Value Count Frequency (%)  
NO ENEWS 915 3.9%
 
PWE 864 3.7%
 
FME 705 3.0%
 
RET-PHAR 485 2.1%
 
PWE LPCNOMIN 321 1.4%
 
F-SUPP FME 296 1.3%
 
LAPSDMEM 293 1.3%
 
RET-NEWS 224 1.0%
 
FUNERALD 183 0.8%
 
FME LAPSDMEM 159 0.7%
 
Other values (667) 3653 15.6%
 
(Missing) 15291 65.4%
 

Fme
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.1%
Missing (n) 20
N
23263
Y
 
106
(Missing)
 
20
Value Count Frequency (%)  
N 23263 99.5%
 
Y 106 0.5%
 
(Missing) 20 0.1%
 

Member
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.1%
Missing (n) 20
N
22875
Y
 
494
(Missing)
 
20
Value Count Frequency (%)  
N 22875 97.8%
 
Y 494 2.1%
 
(Missing) 20 0.1%
 

Postcode
Categorical

Distinct count 1021
Unique (%) 4.4%
Missing (%) 1.4%
Missing (n) 331
4074
 
1171
4510
 
616
4350
 
529
Other values (1017)
20742
Value Count Frequency (%)  
4074 1171 5.0%
 
4510 616 2.6%
 
4350 529 2.3%
 
4068 510 2.2%
 
4073 466 2.0%
 
4069 419 1.8%
 
4000 326 1.4%
 
4305 312 1.3%
 
4066 309 1.3%
 
4506 286 1.2%
 
Other values (1010) 18114 77.4%
 
(Missing) 331 1.4%
 

Pwe
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.1%
Missing (n) 20
N
23101
Y
 
268
(Missing)
 
20
Value Count Frequency (%)  
N 23101 98.8%
 
Y 268 1.1%
 
(Missing) 20 0.1%
 

Sex
Categorical

Distinct count 4
Unique (%) 0.0%
Missing (%) 9.3%
Missing (n) 2184
F
11924
M
8436
B
 
845
(Missing)
 
2184
Value Count Frequency (%)  
F 11924 51.0%
 
M 8436 36.1%
 
B 845 3.6%
 
(Missing) 2184 9.3%
 

Suburb
Categorical

Distinct count 2069
Unique (%) 8.9%
Missing (%) 0.7%
Missing (n) 159
SINNAMON PARK
 
345
INDOOROOPILLY
 
341
MORAYFIELD
 
286
Other values (2065)
22258
Value Count Frequency (%)  
SINNAMON PARK 345 1.5%
 
INDOOROOPILLY 341 1.5%
 
MORAYFIELD 286 1.2%
 
UNKNOWN 283 1.2%
 
CABOOLTURE 278 1.2%
 
BURPENGARY 260 1.1%
 
WESTLAKE 258 1.1%
 
NARANGBA 246 1.1%
 
MIDDLE PARK 236 1.0%
 
RIVERHILLS 227 1.0%
 
Other values (2058) 20470 87.5%
 

Vip
Categorical

Distinct count 3
Unique (%) 0.0%
Missing (%) 0.1%
Missing (n) 22
N
23240
Y
 
127
(Missing)
 
22
Value Count Frequency (%)  
N 23240 99.4%
 
Y 127 0.5%
 
(Missing) 22 0.1%
 

Sample

Donor Number Aquisition Date Country Dob Extra Codes Fme Member Pwe Sex Suburb Vip Donor Source Donor Type Postcode
0 1009 1/4/94 NaN 00/00/0000 NaN N N N F MURARRIE N ACQ IND 4172
1 1015 1/4/94 NaN 00/00/0000 NaN N N N M MANLY WEST N TELETR05 IND 4179
2 1027 1/4/94 NaN 00/00/0000 NaN N N N F BOONDALL N ACQ IND 4034
3 1029 1/4/94 NaN 00/00/0000 NaN N N N F EASTERN HEIGHTS N ACQ IND 4305
4 1030 1/4/94 NaN 00/00/0000 NaN N N N B CLONTARF N ACQ IND 4019

Save report to file


In [ ]:
pfr = pandas_profiling.ProfileReport(df)
pfr.to_file("donor_information.html")

In [ ]:
#### Print existing ProfileReport object inline
pfr

In [ ]: