Identifying Biased Features Tutorial

This tutorial will show you how to identify features that help your models in a way that might just be too good to be true. This can happen if there was a problem with the way the dataset was put together, if the machine learning problem wasn't scoped properly, or even because of a bug in one of the feature generators. At times it is hard to understand what a model is really doing, behind the scenes. That's where MLDB's classifier.explain comes to the rescue. In particular, it can help discover that a model is cheating, or in other words, that it has learnt to use bits of information that won't be available when applying the model in real life.

To illustrate this, we are going to train a model on some data where we know a feature is biased. You can find the details here. Basically the task is to predict if the client will subscribe to a term deposit after he receives a call from the bank, given some informations about the client (the employee calling, scocioeconomic conditions at the time, etc.).

Setting up

The notebook cells below use pymldb's Connection class to make REST API calls. You can check out the Using pymldb Tutorial for more details.



In [2]:

    
import pymldb
mldb = pymldb.Connection()

Importing the data

Let's start by importing the data, which we have copied on our servers.



In [3]:

    
print mldb.put('/v1/procedures/_', {
    'type': 'import.text',
    'params': {
        'dataFileUrl':
            'archive+http://public.mldb.ai/datasets/bank-additional.zip#bank-additional/bank-additional-full.csv',
        'outputDataset': 'bank_raw',
        'delimiter': ';'
        }
    })









    



<Response [201]>

Here is a sneak peek of the data.



In [4]:

    
mldb.query("""
SELECT *
FROM bank_raw
LIMIT 10
""")









    Out[4]:






  
    
      
      age
      campaign
      "cons.conf.idx"
      "cons.price.idx"
      contact
      day_of_week
      default
      duration
      education
      "emp.var.rate"
      ...
      housing
      job
      loan
      marital
      month
      "nr.employed"
      pdays
      poutcome
      previous
      y
    
    
      _rowName
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2
      56
      1
      -36.4
      93.994
      telephone
      mon
      no
      261
      basic.4y
      1.1
      ...
      no
      housemaid
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      3
      57
      1
      -36.4
      93.994
      telephone
      mon
      unknown
      149
      high.school
      1.1
      ...
      no
      services
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      4
      37
      1
      -36.4
      93.994
      telephone
      mon
      no
      226
      high.school
      1.1
      ...
      yes
      services
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      5
      40
      1
      -36.4
      93.994
      telephone
      mon
      no
      151
      basic.6y
      1.1
      ...
      no
      admin.
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      6
      56
      1
      -36.4
      93.994
      telephone
      mon
      no
      307
      high.school
      1.1
      ...
      no
      services
      yes
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      7
      45
      1
      -36.4
      93.994
      telephone
      mon
      unknown
      198
      basic.9y
      1.1
      ...
      no
      services
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      8
      59
      1
      -36.4
      93.994
      telephone
      mon
      no
      139
      professional.course
      1.1
      ...
      no
      admin.
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      9
      41
      1
      -36.4
      93.994
      telephone
      mon
      unknown
      217
      unknown
      1.1
      ...
      no
      blue-collar
      no
      married
      may
      5191
      999
      nonexistent
      0
      no
    
    
      10
      24
      1
      -36.4
      93.994
      telephone
      mon
      no
      380
      professional.course
      1.1
      ...
      yes
      technician
      no
      single
      may
      5191
      999
      nonexistent
      0
      no
    
    
      11
      25
      1
      -36.4
      93.994
      telephone
      mon
      no
      50
      high.school
      1.1
      ...
      yes
      services
      no
      single
      may
      5191
      999
      nonexistent
      0
      no
    
  

10 rows × 21 columns

Training a first model

We can train a model on a random selection of 75% of the data, keeping the other 25% for testing.



In [6]:

    
print mldb.put('/v1/procedures/_', {
    'type': 'classifier.train',
    'params': {
        'trainingData': """
            SELECT {* EXCLUDING (y)} AS features,
                   y = 'yes' AS label
            FROM bank_raw
            WHERE rowHash() % 4 != 0
            """,
        'modelFileUrl': 'file://bank_model.cls',
        'algorithm': 'bbdt',
        'functionName': 'score',
        'mode': 'boolean'
        }
    })









    



<Response [201]>

This creates a classifier function named "score" that we can use on examples from our test set. The higher the score, the more likely the client is going to subscribe. We can try it on examples from our test set.



In [7]:

    
mldb.query("""
SELECT score({features: {* EXCLUDING (y)}}) AS *
FROM bank_raw
WHERE rowHash() % 4 = 0
LIMIT 10
""")

Now let's see how well our model does on the 25% of the data we didn't train on and get a feel of how good it should perform in real life.



In [6]:

    
mldb.put('/v1/procedures/_', {
    'type': 'classifier.test',
    'params': {
        'testingData': """
            SELECT score: score({features: {* EXCLUDING (y)}})[score], label: y = 'yes'
            FROM bank_raw
            WHERE rowHash() % 4 = 0
            """,
        'outputDataset': 'bank_test',
        'mode': 'boolean'
        }
    })









    Out[6]:




PUT http://localhost/v1/procedures/_
201 Created
 {
  "status": {
    "firstRun": {
      "runStarted": "2016-12-15T16:31:28.7723858Z", 
      "status": {
        "auc": 0.9490764329728899, 
        "bestMcc": {
          "pr": {
            "recall": 0.8323404255319149, 
            "f1Score": 0.6522174058019339, 
            "precision": 0.5361842105263158, 
            "accuracy": 0.8989830508474577
          }, 
          "mcc": 0.616090015078905, 
          "gain": 4.711576147816349, 
          "threshold": -0.1347393393516541, 
          "counts": {
            "falseNegatives": 197.0, 
            "truePositives": 978.0, 
            "trueNegatives": 8304.0, 
            "falsePositives": 846.0
          }, 
          "population": {
            "included": 1824.0, 
            "excluded": 8501.0
          }
        }, 
        "bestF1Score": {
          "pr": {
            "recall": 0.774468085106383, 
            "f1Score": 0.6537356321839081, 
            "precision": 0.5655686761963953, 
            "accuracy": 0.9066343825665859
          }, 
          "mcc": 0.6112159939340592, 
          "gain": 4.969784324874707, 
          "threshold": 0.03156408667564392, 
          "counts": {
            "falseNegatives": 265.0, 
            "truePositives": 910.0, 
            "trueNegatives": 8451.0, 
            "falsePositives": 699.0
          }, 
          "population": {
            "included": 1609.0, 
            "excluded": 8716.0
          }
        }
      }, 
      "runFinished": "2016-12-15T16:31:28.946914Z", 
      "id": "2016-12-15T16:31:28.772233Z-463496b56263af05", 
      "state": "finished"
    }
  }, 
  "config": {
    "params": {
      "outputDataset": "bank_test", 
      "mode": "boolean", 
      "testingData": "\n            SELECT score: score({features: {* EXCLUDING (y)}})[score], label: y = 'yes'\n            FROM bank_raw\n            WHERE rowHash() % 4 = 0\n            "
    }, 
    "type": "classifier.test", 
    "id": "_"
  }, 
  "state": "ok", 
  "type": "classifier.test", 
  "id": "_"
}

As we can see by inspecting the different statistics returned by the classifier.test procedure, that model seems to be doing pretty good! The AUC is 0.95: let's ship this thing in production right now! ... Or let's be cautious!

To understand what's going on, let's use the classifier.explain function. This will give us an idea of how much each feature helps (or hurts) in making the predictions.



In [7]:

    
print mldb.put('/v1/functions/explain', {
    'type': 'classifier.explain',
    'params': {
        'modelFileUrl': 'file://bank_model.cls'
        }
    })









    



<Response [201]>

You can "explain" every single example, and know how much each feature influences the final score, like this:



In [8]:

    
mldb.query("""
SELECT explain({features: {* EXCLUDING (y)}, label: y = 'yes'}) AS *
FROM bank_raw
WHERE rowHash() % 4 = 0
LIMIT 10
""")









    Out[8]:






  
    
      
      bias
      explanation."""cons.conf.idx"""
      explanation."""cons.price.idx"""
      explanation."""emp.var.rate"""
      explanation."""nr.employed"""
      explanation.age
      explanation.campaign
      explanation.contact
      explanation.day_of_week
      explanation.default
      ...
      explanation.education
      explanation.euribor3m
      explanation.housing
      explanation.job
      explanation.loan
      explanation.marital
      explanation.month
      explanation.pdays
      explanation.poutcome
      explanation.previous
    
    
      _rowName
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      12
      -0.162905
      0.022677
      0.093657
      1.469614
      0.376465
      0.074452
      -0.021097
      -0.047695
      0.222051
      -0.037208
      ...
      0.037822
      0.533715
      0.089397
      0.189035
      -0.008093
      -0.012833
      0.424642
      0.005083
      -0.030940
      -0.010325
    
    
      13
      -0.162905
      0.030544
      0.090608
      0.916880
      0.445076
      -0.050926
      -0.042987
      -0.003224
      0.031379
      0.018937
      ...
      0.106647
      0.538069
      0.006190
      0.014812
      -0.008093
      0.016223
      0.363747
      0.005083
      -0.012902
      -0.018336
    
    
      20
      -0.162905
      0.049367
      0.061987
      0.590823
      0.410140
      0.024434
      -0.036943
      0.027854
      0.031038
      0.018937
      ...
      0.059389
      0.568708
      0.006190
      0.061179
      0.051233
      -0.012833
      0.361657
      0.005083
      -0.012902
      -0.010325
    
    
      22
      -0.162905
      0.022677
      0.122279
      1.448991
      0.378711
      0.009985
      -0.023978
      -0.003224
      0.212649
      0.018937
      ...
      0.106647
      0.523122
      0.089397
      0.009036
      -0.008093
      -0.012833
      0.424642
      0.005083
      -0.030940
      -0.010325
    
    
      27
      -0.162905
      0.027629
      0.061987
      1.352641
      0.394306
      0.074452
      -0.039815
      -0.063717
      0.040781
      0.018937
      ...
      0.007022
      0.430805
      -0.005593
      0.014591
      0.051233
      -0.012833
      0.600313
      0.008857
      -0.012902
      -0.010325
    
    
      29
      -0.162905
      0.049367
      0.061987
      0.802011
      0.446966
      0.117882
      -0.047219
      -0.063717
      0.040781
      -0.037208
      ...
      0.037511
      0.548662
      -0.007611
      0.046389
      -0.008093
      -0.012833
      0.366491
      0.005083
      -0.012902
      -0.010325
    
    
      31
      -0.162905
      0.043766
      0.057420
      0.570200
      0.396533
      -0.020884
      -0.035124
      0.023319
      0.064483
      -0.037208
      ...
      -0.031509
      0.568708
      -0.073198
      0.195006
      0.189248
      -0.114817
      0.282614
      0.008857
      -0.012902
      -0.009871
    
    
      32
      -0.162905
      0.049367
      0.098399
      0.570200
      0.410140
      0.086879
      -0.037234
      0.063220
      0.040781
      0.018937
      ...
      0.006318
      0.568708
      -0.007611
      -0.070634
      -0.008093
      -0.012833
      0.358914
      0.005083
      -0.012902
      -0.010436
    
    
      37
      -0.162905
      0.049367
      0.061987
      0.570200
      0.454470
      -0.020884
      -0.035124
      -0.063717
      0.031038
      -0.037208
      ...
      -0.014986
      0.568708
      0.006190
      0.195006
      -0.033121
      -0.012833
      0.358914
      0.005083
      -0.012902
      -0.009871
    
    
      40
      -0.162905
      0.033917
      0.125016
      0.305734
      0.410140
      0.081519
      -0.036943
      0.063220
      0.040781
      0.018937
      ...
      -0.053242
      0.426775
      0.008209
      -0.055832
      -0.008093
      -0.012833
      0.138620
      0.006721
      -0.012902
      -0.010436
    
  

10 rows × 21 columns

Or you can do the average on all the examples. Here we then transpose the result and sort it by the absolute value.



In [9]:

    
mldb.query("""
SELECT *
FROM transpose((
    SELECT avg({explain({features: {* EXCLUDING (y)}, label: y='yes'})[explanation] as *}) AS *
    NAMED 'explanation'
    FROM bank_raw
    WHERE rowHash() % 4 = 0
))
ORDER BY abs(explanation) DESC
""")









    Out[9]:






  
    
      
      explanation
    
    
      _rowName
      
    
  
  
    
      duration
      1.391909
    
    
      """emp.var.rate"""
      0.533359
    
    
      euribor3m
      0.367528
    
    
      """nr.employed"""
      0.365880
    
    
      month
      0.120450
    
    
      age
      0.030606
    
    
      campaign
      0.028834
    
    
      """cons.conf.idx"""
      0.024768
    
    
      """cons.price.idx"""
      0.020994
    
    
      pdays
      0.020351
    
    
      day_of_week
      0.019881
    
    
      education
      0.015184
    
    
      job
      0.010417
    
    
      default
      0.006594
    
    
      poutcome
      0.006276
    
    
      previous
      -0.006031
    
    
      contact
      0.005695
    
    
      housing
      0.002711
    
    
      loan
      0.001845
    
    
      marital
      -0.000385

Now what is striking here is that there is one feature that really stands out: duration. This is the actual duration of the call. Clearly, that information would not be available in a real life setting: you can't know the duration of a call before it's over, and when it's over you already know if the client has subscribed or not. If you look at the detailed description of the data, you can in fact see a warning saying that using that piece of information is probably a bad idea for any realistic modeling.

Retraining without the biased feature

Now that we have identified the feature that is responsible for those suspiciously good results, let's train and test again, but adding duration to the excluded columns so that it is not used by the model.



In [10]:

    
print mldb.put('/v1/procedures/_', {
    'type': 'classifier.train',
    'params': {
        'trainingData': """
            SELECT {* EXCLUDING (y, duration)} AS features,
                   y = 'yes' AS label
            FROM bank_raw
            WHERE rowHash() % 4 != 0
            """,
        'modelFileUrl': 'file://bank_model.cls',
        'algorithm': 'bbdt',
        'functionName': 'score',
        'mode': 'boolean'
        }
    })









    



<Response [201]>



In [11]:

    
mldb.put('/v1/procedures/_', {
    'type': 'classifier.test',
    'params': {
        'testingData': """
            SELECT score: score({features: {* EXCLUDING (y)}})[score], label: y = 'yes'
            FROM bank_raw
            WHERE rowHash() % 4 = 0
            """,
        'outputDataset': 'bank_test',
        'mode': 'boolean'
        }
    })









    Out[11]:




PUT http://localhost/v1/procedures/_
201 Created
 {
  "status": {
    "firstRun": {
      "runStarted": "2016-12-15T16:31:30.566227Z", 
      "status": {
        "auc": 0.7986184862225458, 
        "bestMcc": {
          "pr": {
            "recall": 0.5557446808510639, 
            "f1Score": 0.5011511895625479, 
            "precision": 0.4563242487770789, 
            "accuracy": 0.87409200968523
          }, 
          "mcc": 0.4326346653063816, 
          "gain": 4.00982797329646, 
          "threshold": -0.1314646601676941, 
          "counts": {
            "falseNegatives": 522.0, 
            "truePositives": 653.0, 
            "trueNegatives": 8372.0, 
            "falsePositives": 778.0
          }, 
          "population": {
            "included": 1431.0, 
            "excluded": 8894.0
          }
        }, 
        "bestF1Score": {
          "pr": {
            "recall": 0.5557446808510639, 
            "f1Score": 0.5011511895625479, 
            "precision": 0.4563242487770789, 
            "accuracy": 0.87409200968523
          }, 
          "mcc": 0.4326346653063816, 
          "gain": 4.00982797329646, 
          "threshold": -0.1314646601676941, 
          "counts": {
            "falseNegatives": 522.0, 
            "truePositives": 653.0, 
            "trueNegatives": 8372.0, 
            "falsePositives": 778.0
          }, 
          "population": {
            "included": 1431.0, 
            "excluded": 8894.0
          }
        }
      }, 
      "runFinished": "2016-12-15T16:31:30.732396Z", 
      "id": "2016-12-15T16:31:30.566155Z-463496b56263af05", 
      "state": "finished"
    }
  }, 
  "config": {
    "params": {
      "outputDataset": "bank_test", 
      "mode": "boolean", 
      "testingData": "\n            SELECT score: score({features: {* EXCLUDING (y)}})[score], label: y = 'yes'\n            FROM bank_raw\n            WHERE rowHash() % 4 = 0\n            "
    }, 
    "type": "classifier.test", 
    "id": "_"
  }, 
  "state": "ok", 
  "type": "classifier.test", 
  "id": "_"
}

Now a AUC of 0.80 sounds more reasonable!

If we run the explanation again, the highest ranking features seem more legitimate.



In [12]:

    
print mldb.put('/v1/functions/explain', {
    'type': 'classifier.explain',
    'params': {
        'modelFileUrl': 'file://bank_model.cls'
        }
    })









    



<Response [201]>



In [13]:

    
mldb.query("""
SELECT *
FROM transpose((
    SELECT avg({explain({features: {* EXCLUDING (y)}, label: y='yes'})[explanation] as *}) AS *
    NAMED 'explanation'
    FROM bank_raw
    WHERE rowHash() % 4 = 0
))
ORDER BY abs(explanation) DESC
""")









    Out[13]:






  
    
      
      explanation
    
    
      _rowName
      
    
  
  
    
      euribor3m
      0.419913
    
    
      """nr.employed"""
      0.215071
    
    
      """emp.var.rate"""
      0.071023
    
    
      campaign
      0.027480
    
    
      poutcome
      0.019888
    
    
      """cons.conf.idx"""
      0.019198
    
    
      month
      0.018006
    
    
      pdays
      0.011858
    
    
      contact
      0.010918
    
    
      age
      0.010127
    
    
      """cons.price.idx"""
      0.007808
    
    
      day_of_week
      0.005137
    
    
      job
      0.004162
    
    
      previous
      -0.003373
    
    
      marital
      0.003092
    
    
      education
      0.002887
    
    
      loan
      0.001451
    
    
      default
      0.001364
    
    
      housing
      0.000671

Conclusion

We have shown how to use MLDB to identify "too good to be true" features when training a model. Keep in mind that features that really help are not necessarily biased, they might just be really good features! Understanding your data is key, and the tool presented here makes it much simpler.

Where to next?

Check out the other Tutorials and Demos.

	age	campaign	"cons.conf.idx"	"cons.price.idx"	contact	day_of_week	default	duration	education	"emp.var.rate"	...	housing	job	loan	marital	month	"nr.employed"	pdays	poutcome	previous	y
_rowName
2	56	1	-36.4	93.994	telephone	mon	no	261	basic.4y	1.1	...	no	housemaid	no	married	may	5191	999	nonexistent	0	no
3	57	1	-36.4	93.994	telephone	mon	unknown	149	high.school	1.1	...	no	services	no	married	may	5191	999	nonexistent	0	no
4	37	1	-36.4	93.994	telephone	mon	no	226	high.school	1.1	...	yes	services	no	married	may	5191	999	nonexistent	0	no
5	40	1	-36.4	93.994	telephone	mon	no	151	basic.6y	1.1	...	no	admin.	no	married	may	5191	999	nonexistent	0	no
6	56	1	-36.4	93.994	telephone	mon	no	307	high.school	1.1	...	no	services	yes	married	may	5191	999	nonexistent	0	no
7	45	1	-36.4	93.994	telephone	mon	unknown	198	basic.9y	1.1	...	no	services	no	married	may	5191	999	nonexistent	0	no
8	59	1	-36.4	93.994	telephone	mon	no	139	professional.course	1.1	...	no	admin.	no	married	may	5191	999	nonexistent	0	no
9	41	1	-36.4	93.994	telephone	mon	unknown	217	unknown	1.1	...	no	blue-collar	no	married	may	5191	999	nonexistent	0	no
10	24	1	-36.4	93.994	telephone	mon	no	380	professional.course	1.1	...	yes	technician	no	single	may	5191	999	nonexistent	0	no
11	25	1	-36.4	93.994	telephone	mon	no	50	high.school	1.1	...	yes	services	no	single	may	5191	999	nonexistent	0	no

	score
_rowName
12	-7.475498
13	-3.350219
20	-2.513726
22	-7.387843
27	-4.639734
29	-3.118614
31	-2.225752
32	-2.396882
37	-2.346199
40	-0.560522

	bias	explanation."""cons.conf.idx"""	explanation."""cons.price.idx"""	explanation."""emp.var.rate"""	explanation."""nr.employed"""	explanation.age	explanation.campaign	explanation.contact	explanation.day_of_week	explanation.default	...	explanation.education	explanation.euribor3m	explanation.housing	explanation.job	explanation.loan	explanation.marital	explanation.month	explanation.pdays	explanation.poutcome	explanation.previous
_rowName
12	-0.162905	0.022677	0.093657	1.469614	0.376465	0.074452	-0.021097	-0.047695	0.222051	-0.037208	...	0.037822	0.533715	0.089397	0.189035	-0.008093	-0.012833	0.424642	0.005083	-0.030940	-0.010325
13	-0.162905	0.030544	0.090608	0.916880	0.445076	-0.050926	-0.042987	-0.003224	0.031379	0.018937	...	0.106647	0.538069	0.006190	0.014812	-0.008093	0.016223	0.363747	0.005083	-0.012902	-0.018336
20	-0.162905	0.049367	0.061987	0.590823	0.410140	0.024434	-0.036943	0.027854	0.031038	0.018937	...	0.059389	0.568708	0.006190	0.061179	0.051233	-0.012833	0.361657	0.005083	-0.012902	-0.010325
22	-0.162905	0.022677	0.122279	1.448991	0.378711	0.009985	-0.023978	-0.003224	0.212649	0.018937	...	0.106647	0.523122	0.089397	0.009036	-0.008093	-0.012833	0.424642	0.005083	-0.030940	-0.010325
27	-0.162905	0.027629	0.061987	1.352641	0.394306	0.074452	-0.039815	-0.063717	0.040781	0.018937	...	0.007022	0.430805	-0.005593	0.014591	0.051233	-0.012833	0.600313	0.008857	-0.012902	-0.010325
29	-0.162905	0.049367	0.061987	0.802011	0.446966	0.117882	-0.047219	-0.063717	0.040781	-0.037208	...	0.037511	0.548662	-0.007611	0.046389	-0.008093	-0.012833	0.366491	0.005083	-0.012902	-0.010325
31	-0.162905	0.043766	0.057420	0.570200	0.396533	-0.020884	-0.035124	0.023319	0.064483	-0.037208	...	-0.031509	0.568708	-0.073198	0.195006	0.189248	-0.114817	0.282614	0.008857	-0.012902	-0.009871
32	-0.162905	0.049367	0.098399	0.570200	0.410140	0.086879	-0.037234	0.063220	0.040781	0.018937	...	0.006318	0.568708	-0.007611	-0.070634	-0.008093	-0.012833	0.358914	0.005083	-0.012902	-0.010436
37	-0.162905	0.049367	0.061987	0.570200	0.454470	-0.020884	-0.035124	-0.063717	0.031038	-0.037208	...	-0.014986	0.568708	0.006190	0.195006	-0.033121	-0.012833	0.358914	0.005083	-0.012902	-0.009871
40	-0.162905	0.033917	0.125016	0.305734	0.410140	0.081519	-0.036943	0.063220	0.040781	0.018937	...	-0.053242	0.426775	0.008209	-0.055832	-0.008093	-0.012833	0.138620	0.006721	-0.012902	-0.010436

	explanation
_rowName
duration	1.391909
"""emp.var.rate"""	0.533359
euribor3m	0.367528
"""nr.employed"""	0.365880
month	0.120450
age	0.030606
campaign	0.028834
"""cons.conf.idx"""	0.024768
"""cons.price.idx"""	0.020994
pdays	0.020351
day_of_week	0.019881
education	0.015184
job	0.010417
default	0.006594
poutcome	0.006276
previous	-0.006031
contact	0.005695
housing	0.002711
loan	0.001845
marital	-0.000385

	explanation
_rowName
euribor3m	0.419913
"""nr.employed"""	0.215071
"""emp.var.rate"""	0.071023
campaign	0.027480
poutcome	0.019888
"""cons.conf.idx"""	0.019198
month	0.018006
pdays	0.011858
contact	0.010918
age	0.010127
"""cons.price.idx"""	0.007808
day_of_week	0.005137
job	0.004162
previous	-0.003373
marital	0.003092
education	0.002887
loan	0.001451
default	0.001364
housing	0.000671