Guillaume Chaslot guillaume.chaslot@data.gouv.fr

Tax Legislation Refactoring

The tax/benefit legislation on individuals in France is more than 200,000 lines long.

We believe that with this tool, you can make it less than 100 lines, transparent, and 95% similar to the existing legislation

How it works:

  1. Define your concepts (e.g., 'nb_enfants', 'age', 'parent isolé') and budget (e.g.: cost less than 2 millions euros)
  2. The machine learning algorithm helps you adjust the parameters of your reform to approximate the current legislation and fit your budget
  3. From the biggest discrepencies with the current legislation, you can improve your concepts (or the legislation)
  4. Repeat until you reach a legislation that matches your own goals. The algorithm takes care of minimizing your budget and maximizing the similarity to current legislation.

Beta version limitations:

For this test, we only take a population of people from all ages, who have 0 to 5 children, no salary, married, veuf, paces, divorces or celibataires, and simulate the "aides sociales"

Current Results:

Within a few minutes, we got a tax reform for "aides sociales" for people with no salary that is:

  • 6 lines long
  • similar to the existing in average at more than 95%

In [1]:
from utils import show_histogram, percent_diff, scatter_plot, multi_scatter
from econometrics import gini, draw_ginis

import matplotlib.pyplot as plt
# import bqplot.pyplot as plt

import pandas as pd
import numpy as np
import random
import qgrid
import copy

from reformators import Excalibur

qgrid.nbinstall(overwrite=True)

%matplotlib inline

def show_last_reform_results(revdisp, simulated_reform):
    if cost > 0:
        display_cost_name = 'Total Cost in Euros'
        display_cost_value = int(cost)
    else:
        display_cost_name = 'Total Savings in Euros'
        display_cost_value = -int(cost)

    old_gini = gini(revdisp)
    new_gini = gini(simulated_reform)

    result_frame = pd.DataFrame({
        display_cost_name: [display_cost_value],
        'Average change / family / month in Euros' : [int(error) / 12],
        'People losing money' : str(100 * pissed) + '%',
        'Old Gini' : old_gini,
        'New Gini' : new_gini,
        })

    result_frame.set_index(display_cost_name, inplace=True)
    qgrid.show_grid(result_frame)

Which variable do you want to optimize?

Here, we want to optimize the variable 'revdisp' which by putting a tax on 'taxable_income'


In [2]:
reformator = Excalibur(target_variable='revdisp',
                       taxable_variable='taxable_variable')

What is the population you want to use?

The population data is simulated for now, see population_simulator.py for details.


In [3]:
reformator.load_openfisca_results('1aj-1bj-f-2000')

# Removing unlikely cases of super old parents due to our approximative population
# reformator.filter_only_likely_population()

# # Keeping only people with no revenu for this test
# reformator.filter_only_no_revenu()

What are the concept you want to use for your reform?

You can add concepts like the number of children, age, family situation, etc...

The input population is in the CERFA "declaration des revenus" format.


In [4]:
def age_dec_1(family):
    return age_from_box(family, '0DA')

def age_dec_2(family):
    if '0DB' in family:
        return age_from_box(family, '0DB')
    return None

def if_married(family):
    return 'M' in family

def if_pacse(family):
    return 'O' in family

def if_veuf(family):
    return 'V' in family

def if_divorce(family):
    return 'D' in family

def if_two_people(family):
    return '0DB' in family

def revenu_1(family):
    return family['1AJ']

def taxable_variable(family):
    if '1BJ' in family:
        return family['1AJ'] + family['1BJ']
    else:
        return family['1AJ']

def revenu_2(family):
    if '1BJ' in family:
        return family['1BJ']
    return False
    
def age_from_box(family, birthdate_box):
    return 2014 - family.get(birthdate_box, 2014)

def if_both_declarant_parent_below_24(family):
    if age_dec_1(family) >= 24:
        return False
    if '0DB' in family and age_dec_2(family) >= 24:
        return False
    if 'F' not in family or family['F'] == 0:
        return False
    return True

def per_child(family):
    if 'F' in family:
        return family['F']
    else:
        return None

def per_child_parent_isole(family):
    if '0DB' in family:
        return None
    if 'F' in family:
        return family['F']
    else:
        return None

def if_parent_isole_moins_20k(family):
    return 'F' in family and family['F'] >= 1 and ('0DB' not in family) and family['1AJ'] < 20000

def if_enfant_unique(family):
    return 'F' in family and family['F'] == 1

def if_deux_enfants_ou_plus(family):
    return 'F' in family and family['F'] >= 2

def per_child_after_2(family):
    if 'F' in family and family['F'] >= 2:
        return family['F'] - 2
    return None

def if_declarant_above_65(family):
    return age_from_box(family, '0DA') >= 65
    
def if_codeclarant_above_65(family):
    return '0DB' in family and age_from_box(family, '0DB') > 65

def per_declarant_above_65(family):
    return int(age_from_box(family, '0DB') >= 65) + int(age_from_box(family, '0DA') >= 65)

def per_declarant_above_24(family):
    return int(age_from_box(family, '0DB') >= 24) + int(age_from_box(family, '0DA') >= 24)

def if_one_declarant_above_24(family):
    return (age_from_box(family, '0DA') >= 24 or ('0DB' in family and age_from_box(family, '0DB') >= 24))

def if_one_declarant_above_24_or_has_children(family):
    return (age_from_box(family, '0DA') >= 24 or ('0DB' in family and age_from_box(family, '0DB') >= 24) or 
           ('F' in family and family['F'] >= 1))

def if_earns_10k(family):
    return taxable_variable(family) > 10000

def if_earns_30k(family):
    return taxable_variable(family) > 30000

def if_earns_40k(family):
    return taxable_variable(family) > 40000

def base(family):
    return True

reformator.add_concept('base', base)
reformator.add_concept('age_dec_1', age_dec_1)
reformator.add_concept('age_dec_2', age_dec_2)
reformator.add_concept('taxable_variable', taxable_variable)
reformator.add_concept('revenu_1', revenu_1)
reformator.add_concept('revenu_2', revenu_2)
reformator.add_concept('per_child', per_child)
reformator.add_concept('per_child_parent_isole', per_child_parent_isole)
reformator.add_concept('per_declarant_above_24', per_declarant_above_24)
reformator.add_concept('per_declarant_above_65', per_declarant_above_65)
reformator.add_concept('per_child_after_2', per_child_after_2)
reformator.add_concept('if_earns_10k', if_earns_10k)
reformator.add_concept('if_earns_30k', if_earns_30k)
reformator.add_concept('if_earns_40k', if_earns_40k)
reformator.add_concept('if_one_declarant_above_24', if_one_declarant_above_24)
reformator.add_concept('if_one_declarant_above_24_or_has_children', if_one_declarant_above_24_or_has_children)
reformator.add_concept('if_both_declarant_parent_below_24', if_both_declarant_parent_below_24)
reformator.add_concept('if_declarant_above_65', if_declarant_above_65)
reformator.add_concept('if_codeclarant_above_65', if_codeclarant_above_65)
reformator.add_concept('if_enfant_unique', if_enfant_unique)
reformator.add_concept('if_deux_enfants_ou_plus', if_deux_enfants_ou_plus)
reformator.add_concept('if_two_people', if_two_people)
reformator.add_concept('if_parent_isole_moins_20k', if_parent_isole_moins_20k)

reformator.summarize_population()


Echantillon of 2863 people, in percent of french population for similar revenu: 0.009543333333333334%

Plots Revenu disponible before reform


In [5]:
revdisp = list(family['revdisp'] for family in reformator._population)

show_histogram(revdisp, 'Distribution of revenu disponible')


Define your reform here!


In [6]:
simulated_reform, error, cost, final_parameters, pissed  = reformator.suggest_reform(
            parameters=[
                            'per_child',
                            'if_declarant_above_65',
                            'if_codeclarant_above_65',
                            'if_one_declarant_above_24_or_has_children',
#                             'if_parent_isole_moins_20k',
                        ],
            tax_rate_parameters=[
                'base',
                'if_deux_enfants_ou_plus',
                'if_earns_30k',
                'if_earns_40k',
            ],
#             tax_threshold_parameters=[
#                 'base',
# #                 'if_enfant_unique',
# #                 'if_deux_enfants_ou_plus',
#             ],
            max_cost=0,
            min_saving=0)


['per_child', 'if_declarant_above_65', 'if_codeclarant_above_65', 'if_one_declarant_above_24_or_has_children']
(5_w,10)-aCMA-ES (mu_w=3.2,w_1=45%) in dimension 8 (seed=922122, Tue Oct 18 16:12:33 2016)
Best: avg change per month: 78070 cost: 19469436 M/year and 38% people with lower salary
Iterat #Fevals   function value  axis ratio  sigma  min&max std  t[m:s]
    1     10 1.857106544982153e+03 1.0e+00 9.57e+03  9e+03  1e+04 0:0.1
    2     20 2.353382130009756e+03 1.1e+00 9.50e+03  9e+03  1e+04 0:0.2
    3     30 1.598434244016520e+03 1.2e+00 9.20e+03  9e+03  1e+04 0:0.4
Best: avg change per month: 130910 cost: -30020326 M/year and 54% people with lower salary
Best: avg change per month: 95725 cost: 17901093 M/year and 8% people with lower salary
Best: avg change per month: 54299 cost: -10839865 M/year and 63% people with lower salary
Best: avg change per month: 19178 cost: 2503560 M/year and 20% people with lower salary
Best: avg change per month: 20207 cost: 931404 M/year and 58% people with lower salary
Best: avg change per month: 25186 cost: 977783 M/year and 24% people with lower salary
Best: avg change per month: 8099 cost: -700475 M/year and 29% people with lower salary
Best: avg change per month: 7752 cost: -1255217 M/year and 60% people with lower salary
Best: avg change per month: 3788 cost: -344765 M/year and 60% people with lower salary
Best: avg change per month: 1529 cost: -152799 M/year and 59% people with lower salary
Best: avg change per month: 1501 cost: -191495 M/year and 51% people with lower salary
Best: avg change per month: 1532 cost: -160790 M/year and 60% people with lower salary
Best: avg change per month: 1420 cost: 147010 M/year and 31% people with lower salary
Best: avg change per month: 1345 cost: -103328 M/year and 47% people with lower salary
Best: avg change per month: 1234 cost: -141227 M/year and 78% people with lower salary
Best: avg change per month: 1516 cost: -97319 M/year and 64% people with lower salary
Best: avg change per month: 1996 cost: -54651 M/year and 55% people with lower salary
Best: avg change per month: 1225 cost: -3427 M/year and 68% people with lower salary
Best: avg change per month: 1233 cost: -120830 M/year and 75% people with lower salary
  100   1000 4.708216281516900e+01 5.8e+01 3.31e+02  3e+01  8e+02 0:11.9
Best: avg change per month: 1388 cost: -105797 M/year and 55% people with lower salary
Best: avg change per month: 909 cost: -95797 M/year and 62% people with lower salary
Best: avg change per month: 1283 cost: 1790 M/year and 57% people with lower salary
Best: avg change per month: 1099 cost: -112897 M/year and 62% people with lower salary
Best: avg change per month: 838 cost: -128142 M/year and 68% people with lower salary
Best: avg change per month: 1297 cost: -222638 M/year and 82% people with lower salary
Best: avg change per month: 1094 cost: -26160 M/year and 58% people with lower salary
Best: avg change per month: 858 cost: -18033 M/year and 46% people with lower salary
Best: avg change per month: 777 cost: -126299 M/year and 52% people with lower salary
Best: avg change per month: 886 cost: -201537 M/year and 71% people with lower salary
Best: avg change per month: 763 cost: -22565 M/year and 41% people with lower salary
Best: avg change per month: 762 cost: -48808 M/year and 46% people with lower salary
Best: avg change per month: 573 cost: -42601 M/year and 44% people with lower salary
Best: avg change per month: 590 cost: -30664 M/year and 43% people with lower salary
Best: avg change per month: 629 cost: -39701 M/year and 43% people with lower salary
Best: avg change per month: 563 cost: -14026 M/year and 42% people with lower salary
Best: avg change per month: 496 cost: -32366 M/year and 48% people with lower salary
Best: avg change per month: 486 cost: -54788 M/year and 51% people with lower salary
Best: avg change per month: 515 cost: -44702 M/year and 54% people with lower salary
Best: avg change per month: 455 cost: -71783 M/year and 56% people with lower salary
Best: avg change per month: 546 cost: -36110 M/year and 62% people with lower salary
Best: avg change per month: 617 cost: -42530 M/year and 48% people with lower salary
Best: avg change per month: 423 cost: -20000 M/year and 47% people with lower salary
Best: avg change per month: 382 cost: -72005 M/year and 49% people with lower salary
Best: avg change per month: 241 cost: -29300 M/year and 53% people with lower salary
  200   2000 6.932528430477206e+00 2.8e+02 5.69e+02  9e+00  9e+02 0:23.7
Best: avg change per month: 222 cost: -28376 M/year and 60% people with lower salary
Best: avg change per month: 246 cost: -47808 M/year and 67% people with lower salary
Best: avg change per month: 221 cost: -3795 M/year and 54% people with lower salary
Best: avg change per month: 149 cost: -9786 M/year and 39% people with lower salary
Best: avg change per month: 158 cost: -1943 M/year and 54% people with lower salary
Best: avg change per month: 154 cost: -5775 M/year and 55% people with lower salary
Best: avg change per month: 141 cost: -14584 M/year and 57% people with lower salary
Best: avg change per month: 152 cost: -12620 M/year and 56% people with lower salary
Best: avg change per month: 126 cost: -5460 M/year and 38% people with lower salary
Best: avg change per month: 135 cost: -10481 M/year and 41% people with lower salary
Best: avg change per month: 132 cost: -15802 M/year and 55% people with lower salary
Best: avg change per month: 111 cost: -5512 M/year and 52% people with lower salary
Best: avg change per month: 113 cost: -7337 M/year and 39% people with lower salary
Best: avg change per month: 106 cost: 238 M/year and 36% people with lower salary
Best: avg change per month: 108 cost: -6796 M/year and 37% people with lower salary
Best: avg change per month: 109 cost: -6124 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6193 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6195 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6304 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6119 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6179 M/year and 38% people with lower salary
  300   3000 3.727955640173365e+00 4.6e+02 9.10e+00  4e-02  9e+00 0:35.6
Best: avg change per month: 108 cost: -6297 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6324 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6319 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6326 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6325 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
  400   4000 3.727897352043533e+00 3.7e+02 2.56e-03  2e-06  6e-04 0:47.4
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
Best: avg change per month: 108 cost: -6316 M/year and 38% people with lower salary
  412   4120 3.727897352043228e+00 4.2e+02 1.25e-03  9e-07  3e-04 0:48.9
termination on tolfun=1e-11 (Tue Oct 18 16:13:22 2016)
final/bestever f-value = 3.727897e+00 3.727897e+00
incumbent solution: [5238.6061785613601, 2464.4624610194523, 4460.3452877393102, 4048.315165681312, 18.946767594789065, -1.2645420971023418, -1.8183780419599307, -0.14777660358605349]
std deviation: [5.9789677345314128e-05, 0.00010324217040517592, 0.00014286044691466144, 0.000290281765086484, 9.4858185389725029e-07, 2.8866470181244149e-06, 1.9001443054398078e-06, 3.057964924948336e-06]

In [7]:
show_last_reform_results(revdisp, simulated_reform)


Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

In [8]:
draw_ginis(revdisp, simulated_reform)



In [9]:
print repr(final_parameters)
def show_coefficients(final_parameters, current_type):
    coefficients = []
    variables = []
    for parameter in final_parameters:
        if parameter['type'] == current_type:
            coefficients.append(parameter['value'])
            variables.append(parameter['variable'])

    result_frame = pd.DataFrame({'Variables': variables, current_type + ' coef': coefficients})
    result_frame.set_index('Variables', inplace=True)
    qgrid.show_grid(result_frame)

show_coefficients(final_parameters, 'base_revenu')
show_coefficients(final_parameters, 'tax_rate')
# show_coefficients(final_parameters, 'tax_threshold')


[{'variable': 'if_one_declarant_above_24_or_has_children', 'type': 'base_revenu', 'value': 5238.6061785613601}, {'variable': 'per_child', 'type': 'base_revenu', 'value': 2464.4624610194523}, {'variable': 'if_declarant_above_65', 'type': 'base_revenu', 'value': 4460.3452877393102}, {'variable': 'if_codeclarant_above_65', 'type': 'base_revenu', 'value': 4048.315165681312}, {'variable': 'base', 'type': 'tax_rate', 'value': 18.946767594789065}, {'variable': 'if_deux_enfants_ou_plus', 'type': 'tax_rate', 'value': -1.2645420971023418}, {'variable': 'if_earns_30k', 'type': 'tax_rate', 'value': -1.8183780419599307}, {'variable': 'if_earns_40k', 'type': 'tax_rate', 'value': -0.14777660358605349}]
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"
Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

In [10]:
new_pop = copy.deepcopy(reformator._population)

for i in range(len(new_pop)):
    new_pop[i]['reform'] = simulated_reform[i] 

def plot_for_population(pop):
    revenu_imposable = list(family['taxable_variable'] for family in pop)
    revdisp = list(family['revdisp'] for family in pop)
    reform = list(family['reform'] for family in pop)

    multi_scatter('Revenu disponible for different people before / after reforme', 'Revenu initial', 'Revenu disponible', [
                  {'x':revenu_imposable, 'y':reform, 'label':'Reform', 'color':'blue'},
                  {'x':revenu_imposable, 'y':revdisp, 'label':'Original', 'color':'red'},
              ])

In [11]:
un_enfant_pop = list(filter(lambda x: x.get('per_child', 0) == 3
                                  and x.get('age_dec_1', 0) < 65
                                  and x.get('age_dec_2', 0) < 65,
                            new_pop))
plot_for_population(un_enfant_pop)



In [14]:
show_last_reform_results(revdisp, simulated_reform)


Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

In [15]:
# simulated_reform_tree, error_tree, cost_tree, final_parameters_tree  = reformator.suggest_reform_tree(
#             parameters=[
#                             'per_child',
#                             'if_one_declarant_above_24_or_has_children',
#                             'age_dec_1',
#                             'age_dec_2'
#                         ],
#                             max_cost=0,
#                             min_saving=0,
#                             image_file='./enfants_age',
#                             max_depth=5,
#                             min_samples_leaf=20
#                         )

Reform results


In [16]:
# show_last_reform_results()
# from IPython.display import Image
# Image(filename='./enfants_age.png')

Plots revenu disponible after reform


In [17]:
xmin = 0
xmax = 60000
nb_buckets = 35

bins = np.linspace(xmin, xmax, nb_buckets)

plt.hist(revdisp, bins, alpha=0.5, label='current')
plt.hist(simulated_reform, bins, alpha=0.5, label='reform')
plt.legend(loc='upper right')
plt.show()


Distribution of the changes in revenu in euros


In [18]:
difference = list(simulated_reform[i] - revdisp[i] for i in range(len(simulated_reform)))

show_histogram(difference, 'Changes in revenu')


Distribution of the change in revenu in percentage


In [19]:
percentage_difference = list(100 * percent_diff(simulated_reform[i], revdisp[i]) for i in range(len(simulated_reform)))

show_histogram(percentage_difference, 'Changes in revenu')


Change as a function of the number of children


In [20]:
nb_children = list((reformator._population[i].get('per_child', 0)  for i in range(len(reformator._population)))) 

scatter_plot(nb_children, difference, 'Children', 'Difference reform - current', alpha=0.05)


Change as a function of the age of declarant 1

A scatter plot is better than a thousand points #ChineseProverb


In [21]:
age_dec1 = list((reformator._population[i].get('age_dec_1', 0)  for i in range(len(reformator._population)))) 

scatter_plot(age_dec1, difference, 'Age declarant 1', 'Difference reform - current', alpha=0.1)


Most important: Edge Cases

This is the heart of this tool: by seeing the worse cases, you can discover when the current legislation is smarter than yours (or the other way), and improve it further


In [22]:
order = sorted(range(len(simulated_reform)), key=lambda k: -(simulated_reform[k] - revdisp[k]))

data = {}

possible_keys = set()
for i in order:
    for key in reformator._raw_population[i]:
        if key != 'year':
            possible_keys.add(key)

for i in order:
    # Adding the diff with the reform.
    differences = data.get('difference', [])
    differences.append(int(simulated_reform[i] - revdisp[i]))
    data['difference'] = differences

    for key in possible_keys:
        new_vals = data.get(key, [])
        value = reformator._raw_population[i].get(key, '')
        if type(value) == float:
            value = int(value)
        new_vals.append(value)
        data[key] = new_vals

    # Adding reformed line.
    reforms = data.get('reform', [])
    reforms.append(int(simulated_reform[i]))
    data['reform'] = reforms

df = pd.DataFrame(data=data)
df.set_index('difference', inplace=True)
qgrid.show_grid(df)


Widget Javascript not detected.  It may not be installed properly. Did you enable the widgetsnbextension? If not, then run "jupyter nbextension enable --py --sys-prefix widgetsnbextension"

Best compromise simplicity / matching current legislation:


In [23]:
# res, error, cost, final_parameters = reformator.suggest_reform(parameters=[
#                             'if_one_declarant_above_24',
#                             'if_declarant_above_64',
#                             'if_codeclarant_above_64',
#                             'if_both_declarant_parent_below_24',
#                             'if_two_people',
#                             'if_enfant_unique',
#                             'if_deux_enfants_ou_plus',
#                             'nb_enfants_after_2',
#                            ])

In [24]:
# coefficients = list(map(lambda x: x['value'], final_parameters)); variables = list(map(lambda x: x['variable'], final_parameters))
# result_frame = pd.DataFrame({'Variables': variables, 'Coefficient': coefficients})
# result_frame.set_index('Variables', inplace=True)
# qgrid.show_grid(result_frame)

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: