Detecting Credit Card Fraud

In this notebook we will use GraphLab Create to identify a large majority of fraud cases in real-world data from an online retailer. Starting by a simple fraud classifier we will optimize it for the best available performance.
The dataset is higly sensitive, thus it is anonymized and cannot be shared.

The notebook is orginaized into the following sections:

Load and explore the data
Create new features
Split data into train and test sets
Create different models
Deploy models as REST service

This notebook is presented in the Detecting Credit Card Fraud webinar, one of many interesting webinars given by Turi. Check out upcoming webinars here.

Load and explore the data



In [1]:

    
import graphlab as gl



In [2]:

    
data = gl.SFrame('fraud_detection.sf')









    



2016-03-24 12:26:26,697 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.5 started. Logging: C:\Users\Alon\AppData\Local\Temp\graphlab_server_1458815185.log.0






    



This commercial license of GraphLab Create is assigned to engr@turi.com.



In [3]:

    
data.head(3)









    Out[3]:





    
        transaction status
        fraud
        payment lead days
        days to event
        currency
        is customer email free
        blacklisted
        review by payment gateway
    
    
        S1
        yes
        59.0
        165.0
        A7
        yes
        no
        no
    
    
        S1
        yes
        8.0
        91.0
        A7
        no
        no
        no
    
    
        S1
        yes
        2.0
        16.0
        A7
        no
        no
        no
    


    
        quote amount
        payment amount
        transaction id
        secure payment
        fully paid
        customer id
    
    
        2963.0
        1482.0
        1b39953393a43642319365a3e
fbf68f6 ...
        no
        yes
        90f5c4ed986bb4e7d41fe2141
a75d451 ...
    
    
        1440.0
        1441.0
        efa12fbba6a3f73b41c8fc1f7
aaa32f7 ...
        no
        yes
        e11163ad83a7f893b94c28eed
3f3dc6c ...
    
    
        983.0
        492.0
        b514124a97b987798fce8ac43
3a5c727 ...
        no
        yes
        a9062874e5c02a4c1673cd942
e97a3af ...
    


    
        customer
        cardholder
        business type
        credit card company
    
    
        d4f4f004f6dfa77989cabbf75
64bbca310bc5dc2 ...
        33b8e7c599fb5d8d0c9a17c49
edf2a2e88150a84 ...
        E74
        None
    
    
        dc1894b3c9801372d775ae86a
936cc05129632ed ...
        4977e7ad7ec55a58c99b95649
38de41677063fd1 ...
        E98
        T2
    
    
        98d5fbbe4ee07096e325307e3
e30bb58801832ff ...
        67bed2dbdfd986dcb5a172325
c70b6eabcfb3f9c ...
        E98
        T2
    


    
        customer email domain
        business country
        customer country
        business email domain
    
    
        7db8006395b7071940049e927
a53424292857302 ...
        C75
        C75
        5411cdc646249d74e1fbb6151
015029f11ae176f ...
    
    
        e289c31d6c18bfb805b680b16
6b5dcd3258290a5 ...
        C75
        C75
        fcc17e12e4ea7f72fb32d9ebb
9e147ea17319baf ...
    
    
        68ff4e7162144205e744ae00e
efe087cc7623407 ...
        None
        C75
        bed849f4a1278205ca2523383
149a55e1142cbe1 ...
    


    
        credit card number
        credit card expiration
month ...
        credit card expiration
year ...
        ip
        transaction date
    
    
        0c848d301fcbdbfdf7a7f8767
d8d803713cb0073965e67 ...
        04
        2015
        209.149.63.230
        17.10.2013
    
    
        f56334df48679d5bd83357bd3
7bbd6672d37a38a53aee6 ...
        08
        2016
        184.80.149.174
        15.08.2013
    
    
        4f14f4d46dff976de1d9170cb
a19886dd8de479cf76adf ...
        08
        2014
        47.251.202.177
        11.05.2013
    

[3 rows x 27 columns]



In [4]:

    
len(data)









    Out[4]:





135967



In [5]:

    
data.show()









    



Canvas is accessible via web browser at the URL: http://localhost:55435/index.html
Opening Canvas in default web browser.

We see that the data is highly categorical, and highly unbalanced.
Let's visualize some part of the data.



In [6]:

    
# Tell GraphLab to display canvas in the notebook itself
gl.canvas.set_target('ipynb')



In [7]:

    
data.show(view='BoxWhisker Plot', x='fraud', y='payment amount')

Create new features

Date features



In [8]:

    
# Transform string date into datetime type.
# This will help us further along to compare dates.
data['transaction date'] = data['transaction date'].str_to_datetime(str_format='%d.%m.%Y')

# Split date into its components and set them as categorical features 
data.add_columns(data['transaction date'].split_datetime(limit=['year','month','day'], column_name_prefix='transaction'))
data['transaction.year'] = data['transaction.year'].astype(str)
data['transaction.month'] = data['transaction.month'].astype(str)
data['transaction.day'] = data['transaction.day'].astype(str)



In [9]:

    
# Create day of week feature and set it as a categorical feature
data['transaction week day'] = data['transaction date'].apply(lambda x: x.weekday())
data['transaction week day'] = data['transaction week day'].astype(str)



In [10]:

    
data.head(3)









    Out[10]:





    
        transaction status
        fraud
        payment lead days
        days to event
        currency
        is customer email free
        blacklisted
        review by payment gateway
    
    
        S1
        yes
        59.0
        165.0
        A7
        yes
        no
        no
    
    
        S1
        yes
        8.0
        91.0
        A7
        no
        no
        no
    
    
        S1
        yes
        2.0
        16.0
        A7
        no
        no
        no
    


    
        quote amount
        payment amount
        transaction id
        secure payment
        fully paid
        customer id
    
    
        2963.0
        1482.0
        1b39953393a43642319365a3e
fbf68f6 ...
        no
        yes
        90f5c4ed986bb4e7d41fe2141
a75d451 ...
    
    
        1440.0
        1441.0
        efa12fbba6a3f73b41c8fc1f7
aaa32f7 ...
        no
        yes
        e11163ad83a7f893b94c28eed
3f3dc6c ...
    
    
        983.0
        492.0
        b514124a97b987798fce8ac43
3a5c727 ...
        no
        yes
        a9062874e5c02a4c1673cd942
e97a3af ...
    


    
        customer
        cardholder
        business type
        credit card company
    
    
        d4f4f004f6dfa77989cabbf75
64bbca310bc5dc2 ...
        33b8e7c599fb5d8d0c9a17c49
edf2a2e88150a84 ...
        E74
        None
    
    
        dc1894b3c9801372d775ae86a
936cc05129632ed ...
        4977e7ad7ec55a58c99b95649
38de41677063fd1 ...
        E98
        T2
    
    
        98d5fbbe4ee07096e325307e3
e30bb58801832ff ...
        67bed2dbdfd986dcb5a172325
c70b6eabcfb3f9c ...
        E98
        T2
    


    
        customer email domain
        business country
        customer country
        business email domain
    
    
        7db8006395b7071940049e927
a53424292857302 ...
        C75
        C75
        5411cdc646249d74e1fbb6151
015029f11ae176f ...
    
    
        e289c31d6c18bfb805b680b16
6b5dcd3258290a5 ...
        C75
        C75
        fcc17e12e4ea7f72fb32d9ebb
9e147ea17319baf ...
    
    
        68ff4e7162144205e744ae00e
efe087cc7623407 ...
        None
        C75
        bed849f4a1278205ca2523383
149a55e1142cbe1 ...
    


    
        credit card number
        credit card expiration
month ...
        credit card expiration
year ...
        ip
        transaction date
    
    
        0c848d301fcbdbfdf7a7f8767
d8d803713cb0073965e67 ...
        04
        2015
        209.149.63.230
        2013-10-17 00:00:00
    
    
        f56334df48679d5bd83357bd3
7bbd6672d37a38a53aee6 ...
        08
        2016
        184.80.149.174
        2013-08-15 00:00:00
    
    
        4f14f4d46dff976de1d9170cb
a19886dd8de479cf76adf ...
        08
        2014
        47.251.202.177
        2013-05-11 00:00:00
    


    
        transaction.year
        transaction.month
        transaction.day
        transaction week day
    
    
        2013
        10
        17
        3
    
    
        2013
        8
        15
        3
    
    
        2013
        5
        11
        5
    

[3 rows x 31 columns]

Indicator features



In [11]:

    
# Create new features and transform them into true/false indicators
data['same country'] = (data['customer country'] == data['business country']).astype(str)
data['same person'] = (data['customer'] == data['cardholder']).astype(str)
data['expiration near'] = (data['credit card expiration year'] == data['transaction.year']).astype(str)

Count features



In [12]:

    
counts = data.groupby('transaction id', {'unique cards per transaction' : gl.aggregate.COUNT_DISTINCT('credit card number'),
                                         'unique cardholders per transaction' : gl.aggregate.COUNT_DISTINCT('cardholder'),
                                         'tries per transaction' : gl.aggregate.COUNT()})
counts.head(3)









    Out[12]:





    
        transaction id
        unique cards per
transaction ...
        unique cardholders per
transaction ...
        tries per transaction
    
    
        ebce93534b35b56ea3cfae1d5
3786008 ...
        1
        1
        1
    
    
        3dc6c7c573bad62c4b36f76bd
695da66 ...
        1
        1
        1
    
    
        c41ea2458fa6e961dadd498bf
8528419 ...
        1
        1
        1
    

[3 rows x 4 columns]



In [13]:

    
counts.show()

We see that although most transactions have been paid for by a single credit card, some transactions have as much as 29 unique credit cards!
Let's join the counts back into our dataset so we can visualize the number of unique cards per transaction vs fraud.



In [14]:

    
data = data.join(counts)



In [15]:

    
data.show(view='BoxWhisker Plot', x='fraud', y='unique cards per transaction')



In [16]:

    
print 'Number of columns', len(data.column_names())









    



Number of columns 37

In total we created 9 new features. One can create any number of additional features which will be helpful to create a better fraud detector. For example, historical user features such as the number of transaction in a given timeframe.
For the purposes of the webinar these features will be enough.

Split data into train and test sets

First we will have to split the data into a training set and a testing set so we can evaluate our models. We will split it based on the date column, where the test set will be composed of the last six months of transactions.



In [17]:

    
from datetime import datetime

split = data['transaction date'] > datetime(2015, 6, 1)
data.remove_column('transaction date')

train = data[split == 0]
test = data[split == 1]



In [18]:

    
print 'Training set fraud'
train['fraud'].show()









    



Training set fraud



In [19]:

    
print 'Test set fraud'
test['fraud'].show()









    



Test set fraud

Create model to predict if a given transaction is fraudulent

Logistic Regression baseline



In [20]:

    
logreg_model = gl.logistic_classifier.create(train,
                                             target='fraud',
                                             validation_set=None)









    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




Logistic regression:






    




--------------------------------------------------------






    




Number of examples          : 125557






    




Number of classes           : 2






    




Number of feature columns   : 35






    




Number of unpacked features : 35






    




Number of coefficients    : 475596






    




Starting L-BFGS






    




--------------------------------------------------------






    




+-----------+----------+-----------+--------------+-------------------+






    




| Iteration | Passes   | Step size | Elapsed Time | Training-accuracy |






    




+-----------+----------+-----------+--------------+-------------------+






    




| 1         | 3        | 0.000008  | 1.434012     | 0.992322          |






    




| 2         | 5        | 1.000000  | 1.806275     | 0.995094          |






    




| 3         | 6        | 1.000000  | 2.040439     | 0.999570          |






    




| 4         | 7        | 1.000000  | 2.272605     | 0.999992          |






    




| 5         | 8        | 1.000000  | 2.503765     | 0.999992          |






    




| 6         | 9        | 1.000000  | 2.745938     | 0.999992          |






    




| 10        | 13       | 1.000000  | 3.658582     | 0.999992          |






    




+-----------+----------+-----------+--------------+-------------------+






    




TERMINATED: Iteration limit reached.






    




This model may not be optimal. To improve it, consider increasing `max_iterations`.



In [21]:

    
print 'Logistic Regression Accuracy', logreg_model.evaluate(test)['accuracy']
print 'Logistic Regression Confusion Matrix\n', logreg_model.evaluate(test)['confusion_matrix']









    



Logistic Regression Accuracy 0.996733909702
Logistic Regression Confusion Matrix
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      no      |        no       | 10376 |
|     yes      |        no       |   33  |
|      no      |       yes       |   1   |
+--------------+-----------------+-------+
[3 rows x 3 columns]

Not a single fraud case was detected by the logistic regression model!
As indicated while training the logistic regression model, some features are highly categorical, and when expanded result in many coefficients. We could address this by removing these features from the dataset, or by transforming these features into a more manageable form (e.g. Count Thresholder). For this webinar, we will leave these features as-is and will move on to a stronger classifier.

Boosted Trees Classifier



In [22]:

    
boosted_trees_model = gl.boosted_trees_classifier.create(train, 
                                                         target='fraud',
                                                         validation_set=None)









    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




Boosted trees classifier:






    




--------------------------------------------------------






    




Number of examples          : 125557






    




Number of classes           : 2






    




Number of feature columns   : 35






    




Number of unpacked features : 35






    




+-----------+--------------+-------------------+-------------------+






    




| Iteration | Elapsed Time | Training-accuracy | Training-log_loss |






    




+-----------+--------------+-------------------+-------------------+






    




| 1         | 0.285198     | 0.996878          | 0.439840          |






    




| 2         | 0.562394     | 0.997125          | 0.299877          |






    




| 3         | 0.842592     | 0.997244          | 0.211750          |






    




| 4         | 1.118787     | 0.997396          | 0.152803          |






    




| 5         | 1.389978     | 0.997483          | 0.111957          |






    




| 6         | 1.668174     | 0.997563          | 0.083166          |






    




| 10        | 2.756944     | 0.997650          | 0.028914          |






    




+-----------+--------------+-------------------+-------------------+



In [23]:

    
print 'Boosted trees Accuracy', boosted_trees_model.evaluate(test)['accuracy']
print 'Boosted trees Confusion Matrix\n', boosted_trees_model.evaluate(test)['confusion_matrix']









    



Boosted trees Accuracy 0.998366954851
Boosted trees Confusion Matrix
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      no      |        no       | 10364 |
|     yes      |        no       |   4   |
|     yes      |       yes       |   29  |
|      no      |       yes       |   13  |
+--------------+-----------------+-------+
[4 rows x 3 columns]

29 out of 33 fraud cases were detected by the boosted trees model.

Let's tune the parameters of the model so we can squeeze extra performance out of it. In this example I chose parameters that were evaluated before hand, but GraphLab offers the functionality to do a distributed search across a grid of parameters. To learn more click here.



In [24]:

    
boosted_trees_model = gl.boosted_trees_classifier.create(train, 
                                                         target='fraud',
                                                         validation_set=None,
                                                         max_iterations=40,
                                                         max_depth=9,
                                                         class_weights='auto')









    




WARNING: The number of feature dimensions in this problem is very large in comparison with the number of examples. Unless an appropriate regularization value is set, this model may not provide accurate predictions for a validation/test set.






    




Boosted trees classifier:






    




--------------------------------------------------------






    




Number of examples          : 125557






    




Number of classes           : 2






    




Number of feature columns   : 35






    




Number of unpacked features : 35






    




+-----------+--------------+-------------------+-------------------+






    




| Iteration | Elapsed Time | Training-accuracy | Training-log_loss |






    




+-----------+--------------+-------------------+-------------------+






    




| 1         | 0.387272     | 0.981757          | 0.460638          |






    




| 2         | 0.764539     | 0.976711          | 0.330015          |






    




| 3         | 1.163821     | 0.979027          | 0.245213          |






    




| 4         | 1.560101     | 0.979787          | 0.188401          |






    




| 5         | 1.962385     | 0.980019          | 0.148694          |






    




| 6         | 2.347657     | 0.982517          | 0.119373          |






    




| 10        | 3.932776     | 0.988897          | 0.058315          |






    




| 11        | 4.448140     | 0.991498          | 0.050472          |






    




| 15        | 6.315458     | 0.996765          | 0.033629          |






    




| 20        | 8.432953     | 0.996934          | 0.025129          |






    




| 25        | 10.362315    | 0.997038          | 0.020979          |






    




| 30        | 12.290677    | 0.997179          | 0.018338          |






    




| 35        | 14.220039    | 0.997299          | 0.016659          |






    




| 40        | 16.114376    | 0.997383          | 0.015333          |






    




+-----------+--------------+-------------------+-------------------+



In [25]:

    
print 'Boosted trees Accuracy', boosted_trees_model.evaluate(test)['accuracy']
print 'Boosted trees Confusion Matrix\n', boosted_trees_model.evaluate(test)['confusion_matrix']









    



Boosted trees Accuracy 0.997502401537
Boosted trees Confusion Matrix
+--------------+-----------------+-------+
| target_label | predicted_label | count |
+--------------+-----------------+-------+
|      no      |        no       | 10354 |
|     yes      |        no       |   3   |
|     yes      |       yes       |   30  |
|      no      |       yes       |   23  |
+--------------+-----------------+-------+
[4 rows x 3 columns]

The tuned model found one more fraud case than the previous un-tuned model, at the price of a few more false positives. The desired balance between false positives and false negatives depends on the application. In fraud detection we may want to minimize false negatives so we can save more money, while false positives will just waste more time for a fraud detection expert inspecting transactions flagged by our model.



In [26]:

    
# Inspect the features most used by the boosted trees model
boosted_trees_model.get_feature_importance()









    Out[26]:





    
        name
        index
        count
    
    
        payment amount
        None
        216
    
    
        days to event
        None
        190
    
    
        payment lead days
        None
        137
    
    
        quote amount
        None
        135
    
    
        blacklisted
        no
        43
    
    
        ip
        184.80.149.174
        28
    
    
        customer country
        C75
        27
    
    
        business type
        E43
        27
    
    
        credit card company
        
        26
    
    
        credit card expiration
year ...
        2016
        25
    

[475623 rows x 3 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Deploying the model into a resilient & elastic service

To connect to AWS, you will have to set your own AWS credentials by calling:

gl.aws.set_credentials(<your public key>,
                       <your private key>)



In [27]:

    
state_path = 's3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5'

ps = gl.deploy.predictive_service.load(state_path)









    



2016-03-24 12:28:24,033 [WARNING] graphlab.deploy.predictive_service, 384: Overwriting existing Predictive Service "demolab-ps-one-eight-five" in local session.



In [28]:

    
# Pickle and send the model over to the server.
ps.add('fraud', boosted_trees_model)
ps.apply_changes()









    



2016-03-24 12:28:27,285 [INFO] graphlab.deploy._predictive_service._predictive_service, 1450: Endpoint 'fraud' is added. Use apply_changes() to deploy all pending changes, or continue with other modification.
2016-03-24 12:28:27,286 [INFO] graphlab.deploy._predictive_service._predictive_service, 1725: Persisting endpoint changes.
2016-03-24 12:28:27,578 [INFO] graphlab.util.file_util, 189: Uploading local path c:\users\alon\appdata\local\temp\predictive_object_japony to s3 path: s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1






    



upload: c:\users\alon\appdata\local\temp\predictive_object_japony\f3d116da-9e09-4208-9830-7b08fa911200\dir_archive.ini to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/f3d116da-9e09-4208-9830-7b08fa911200/dir_archive.ini
upload: c:\users\alon\appdata\local\temp\predictive_object_japony\f3d116da-9e09-4208-9830-7b08fa911200\m_6c06b43b3c85d13c.sidx to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/f3d116da-9e09-4208-9830-7b08fa911200/m_6c06b43b3c85d13c.sidx
upload: c:\users\alon\appdata\local\temp\predictive_object_japony\f3d116da-9e09-4208-9830-7b08fa911200\m_6c06b43b3c85d13c.frame_idx to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/f3d116da-9e09-4208-9830-7b08fa911200/m_6c06b43b3c85d13c.frame_idx
upload: c:\users\alon\appdata\local\temp\predictive_object_japony\pickle_archive to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/pickle_archive
upload: c:\users\alon\appdata\local\temp\predictive_object_japony\version to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/version
upload: c:\users\alon\appdata\local\temp\predictive_object_japony\f3d116da-9e09-4208-9830-7b08fa911200\m_6c06b43b3c85d13c.0000 to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/f3d116da-9e09-4208-9830-7b08fa911200/m_6c06b43b3c85d13c.0000
Completed 10 of 10 part(s) with 1 file(s) remaining





    



2016-03-24 12:34:12,993 [INFO] graphlab.util.file_util, 244: Successfully uploaded to s3 path s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1






    



upload: c:\users\alon\appdata\local\temp\predictive_object_japony\f3d116da-9e09-4208-9830-7b08fa911200\objects.bin to s3://gl-demo-usw2/predictive_service/demolab/ps-1.8.5/predictive_objects/fraud/1/f3d116da-9e09-4208-9830-7b08fa911200/objects.bin



In [29]:

    
# Predictive services must be displayed in a browser
gl.canvas.set_target('browser')

ps.show()









    



Canvas is accessible via web browser at the URL: http://localhost:55975/index.html
Opening Canvas in default web browser.






    



2016-03-24 12:35:38,621 [INFO] graphlab.deploy._predictive_service._predictive_service, 2530: retrieving metrics from predictive service...

RESTfully query the service



In [30]:

    
ps.query('fraud', method='predict', data={'dataset' : test[0]})









    Out[30]:





{u'from_cache': False,
 u'model': u'fraud',
 u'response': [u'yes'],
 u'uuid': u'5bc151bf-e47a-4f84-b29f-11bb7d3ba4db',
 u'version': 1}



In [31]:

    
test[0]['fraud']









    Out[31]:





'yes'

transaction status	fraud	payment lead days	days to event	currency	is customer email free	blacklisted	review by payment gateway
S1	yes	59.0	165.0	A7	yes	no	no
S1	yes	8.0	91.0	A7	no	no	no
S1	yes	2.0	16.0	A7	no	no	no

quote amount	payment amount	transaction id	secure payment	fully paid	customer id
2963.0	1482.0	1b39953393a43642319365a3e fbf68f6 ...	no	yes	90f5c4ed986bb4e7d41fe2141 a75d451 ...
1440.0	1441.0	efa12fbba6a3f73b41c8fc1f7 aaa32f7 ...	no	yes	e11163ad83a7f893b94c28eed 3f3dc6c ...
983.0	492.0	b514124a97b987798fce8ac43 3a5c727 ...	no	yes	a9062874e5c02a4c1673cd942 e97a3af ...

customer	cardholder	business type	credit card company
d4f4f004f6dfa77989cabbf75 64bbca310bc5dc2 ...	33b8e7c599fb5d8d0c9a17c49 edf2a2e88150a84 ...	E74	None
dc1894b3c9801372d775ae86a 936cc05129632ed ...	4977e7ad7ec55a58c99b95649 38de41677063fd1 ...	E98	T2
98d5fbbe4ee07096e325307e3 e30bb58801832ff ...	67bed2dbdfd986dcb5a172325 c70b6eabcfb3f9c ...	E98	T2

customer email domain	business country	customer country	business email domain
7db8006395b7071940049e927 a53424292857302 ...	C75	C75	5411cdc646249d74e1fbb6151 015029f11ae176f ...
e289c31d6c18bfb805b680b16 6b5dcd3258290a5 ...	C75	C75	fcc17e12e4ea7f72fb32d9ebb 9e147ea17319baf ...
68ff4e7162144205e744ae00e efe087cc7623407 ...	None	C75	bed849f4a1278205ca2523383 149a55e1142cbe1 ...

credit card number	credit card expiration month ...	credit card expiration year ...	ip	transaction date
0c848d301fcbdbfdf7a7f8767 d8d803713cb0073965e67 ...	04	2015	209.149.63.230	17.10.2013
f56334df48679d5bd83357bd3 7bbd6672d37a38a53aee6 ...	08	2016	184.80.149.174	15.08.2013
4f14f4d46dff976de1d9170cb a19886dd8de479cf76adf ...	08	2014	47.251.202.177	11.05.2013

transaction id	unique cards per transaction ...	unique cardholders per transaction ...	tries per transaction
ebce93534b35b56ea3cfae1d5 3786008 ...	1	1	1
3dc6c7c573bad62c4b36f76bd 695da66 ...	1	1	1
c41ea2458fa6e961dadd498bf 8528419 ...	1	1	1

name	index	count
payment amount	None	216
days to event	None	190
payment lead days	None	137
quote amount	None	135
blacklisted	no	43
ip	184.80.149.174	28
customer country	C75	27
business type	E43	27
credit card company		26
credit card expiration year ...	2016	25