Analyse des données EPCI scrapées en 2012

source des données: http://www.collectivites-locales.gouv.fr/

Checks:

nombre d'EPCI à fiscalité propre au 2012/01/01: 2583 (wikipedia)
selon l'insee, il y a 2456 EPCI à fiscalité propre au 2013/01/01 (insee)



In [3]:

    
import pandas as pd
import os
curdir = os.path.abspath('./..')



In [44]:

    
df = pd.read_csv(os.path.join(curdir, 'scraped_data', 'epci_all.csv'))
df[['year', 'net_profit', 'staff_costs', 'financial_costs', 'debt_repayments', 'allocation']].head(n=20)









    Out[44]:






  
    
      
      year
      net_profit
      staff_costs
      financial_costs
      debt_repayments
      allocation
    
  
  
    
      0 
       2007
       1451000
        4684000
       367000
       211000
        2540000
    
    
      1 
       2007
       2460000
        2497000
            0
            0
        4334000
    
    
      2 
       2007
        364000
           2000
            0
            0
              0
    
    
      3 
       2007
        411000
         239000
        26000
       663000
         248000
    
    
      4 
       2007
         95000
          23000
            0
            0
         101000
    
    
      5 
       2007
         87000
          36000
        24000
        11000
          13000
    
    
      6 
       2007
       1444000
        2966000
       166000
       471000
        2172000
    
    
      7 
       2007
        148000
         134000
         4000
        12000
         304000
    
    
      8 
       2007
        580000
        1571000
        72000
       177000
        1171000
    
    
      9 
       2007
        733000
         983000
        12000
        90000
         282000
    
    
      10
       2007
         96000
         102000
        12000
        14000
         106000
    
    
      11
       2007
        510000
         303000
        21000
        52000
         157000
    
    
      12
       2007
        317000
         165000
         2000
            0
        2658000
    
    
      13
       2007
        367000
          66000
        66000
       125000
          35000
    
    
      14
       2007
         86000
          27000
            0
            0
          50000
    
    
      15
       2007
        298000
          47000
        24000
        16000
         161000
    
    
      16
       2007
        104000
         250000
            0
            0
          90000
    
    
      17
       2007
       3933000
       10301000
            0
            0
       23165000
    
    
      18
       2007
        219000
          97000
            0
            0
         233000
    
    
      19
       2007
        989000
         451000
        93000
        98000
        1730000



In [7]:

    
df.columns









    Out[7]:





Index([u'surplus', u'home_tax_rate', u'additionnal_land_property_tax_value', u'property_tax_rate', u'business_property_contribution_basis', u'financing_capacity', u'facilities_expenses', u'business_property_contribution_value', u'compensation_2010_rate', u'operating_revenues', u'business_tax_value', u'property_tax_cuts_on_deliberation', u'property_tax_value', u'land_property_tax_basis', u'received_subsidies', u'business_network_tax_value', u'net_profit', u'business_profit_contribution_basis', u'land_property_tax_cuts_on_deliberation', u'retail_land_tax_cuts_on_deliberation', u'business_property_contribution_rate', u'home_tax_cuts_on_deliberation', u'retail_land_tax_basis', u'thirdparty_balance', u'business_tax_cuts_on_deliberation', u'paid_subsidies', u'business_tax_rate', u'additionnal_land_property_tax_cuts_on_deliberation', u'population', u'name', u'business_profit_contribution_cuts_on_deliberation', u'business_profit_contribution_value', u'business_profit_contribution_rate', u'compensation_2010_basis', u'zone_type', u'land_property_tax_value', u'staff_costs', u'investment_ressources', u'localtax', u'financial_costs', u'purchases_and_external_costs', u'fctva', u'operating_costs', u'debt_repayments', u'tax_refund', u'year', u'residual_financing_capacity', u'siren', u'debt_at_end_year', u'business_network_tax_cuts_on_deliberation', u'additionnal_land_property_tax_rate', u'global_profit', u'business_tax_basis', u'compensation_2010_cuts_on_deliberation', u'property_tax_basis', u'retail_land_tax_rate', u'other_tax', u'home_tax_basis', u'business_network_tax_rate', u'allocation', u'home_tax_value', u'loans', u'compensation_2010_value', u'investments_usage', u'self_financing_capacity', u'land_property_tax_rate', u'url', u'debt_repayment_capacity', u'debt_annual_costs', u'business_network_tax_basis', u'additionnal_land_property_tax_basis', u'retail_land_tax_value', u'business_property_contribution_cuts_on_deliberation'], dtype=object)



In [8]:

    
df['debt_ratio'] = df['debt_annual_costs']/df['operating_revenues']
df['staff_costs_ratio'] = df['staff_costs']/df['operating_revenues']



In [16]:

    
print "Nombre d'EPCI crawlés par an"
df.groupby('year').year.count()









    



Nombre d'EPCI crawlés par an






    Out[16]:





year
2007    2154
2008    2171
2009    2202
2010    2247
2011    2272
2012    2279
dtype: int64

Sur les 2456 epci répertoriés au 2013/01/01 dans le fichier insee (insee), on note qu'il en manque 177 en 2012.

Visiblement, certains codes d'EPCI définis dans le fichier insee ne sont pas les mêmes que ceux utilisé dans l'url sur le site des collectivités.

Exemple:

L'insee identifie l'EPCI de Bar le duc ainsi: 55029 Bar-le-Duc 200033025 CA Bar-le-Duc - Sud Meuse CA
L'url qui identifie cette EPCI est http://alize2.finances.gouv.fr/communes/eneuro/tableau_gfp.php?siren=245500061&dep=055&nomdep=MEUSE&icom=029&type=BPS&param=0

On constate donc un code siren différent.



In [19]:

    
xls = pd.ExcelFile(os.path.join(curdir, 'data', 'epci-au-01-01-2013.xls')
data = xls.parse('Composition communale des EPCI')



In [27]:

    
data['siren'] = data[u'Établissement public à fiscalité propre'][1:]
data['siren'].dropna().unique().size # there is a strange epci ZZZZZZZZZZZZZZ









    Out[27]:





2457



In [37]:

    
len(set(df['siren'].apply(unicode).unique()).symmetric_difference(data['siren'].unique()))









    Out[37]:





168

Ratio d'endettement et des charges de personnel



In [40]:

    
plt.figure(figsize=(12,12));
df[['debt_ratio', 'staff_costs_ratio']].boxplot()
df[['debt_ratio', 'staff_costs_ratio', 'name']].head(20)









    Out[40]:






  
    
      
      debt_ratio
      staff_costs_ratio
      name
    
  
  
    
      0 
       0.043450
       0.383275
                      GFP : CC FAUCIGNY-GLIERES
    
    
      1 
       0.000000
       0.151673
            GFP : CC DES DEUX RIVES DE LA SEINE
    
    
      2 
       0.000000
       0.000574
                  GFP : CC LES COTEAUX DE SEINE
    
    
      3 
       0.672515
       0.232943
                     GFP : CC LA LOUGE ET TOUCH
    
    
      4 
       0.000000
       0.039792
                            GFP : CC SUD MORVAN
    
    
      5 
       0.058431
       0.060100
             GFP : CC DES 2 RIVES DE LA MOSELLE
    
    
      6 
       0.066271
       0.308573
         GFP : CC PAYS PONTCHATEAU SAINT-GILDAS
    
    
      7 
       0.014147
       0.118479
                 GFP : CC DU PLATEAU DE LOMMOYE
    
    
      8 
       0.047931
       0.302406
                      GFP : CC PRESQU'ILE RHUYS
    
    
      9 
       0.020540
       0.197946
                        GFP : CC PORTES ROMILLY
    
    
      10
       0.043253
       0.176471
                 GFP : CC ENTRE LOIRE ET ALLIER
    
    
      11
       0.032531
       0.135027
                          GFP : CC LE MINERVOIS
    
    
      12
       0.001750
       0.144357
                    GFP : CC RHONE-LEZ-PROVENCE
    
    
      13
       0.267606
       0.092958
        GFP : CC CEVENNE ET MONTAGNE ARDECHOISE
    
    
      14
       0.000000
       0.102662
       GFP : CC LES GRANDS SITES GORGES ARDECHE
    
    
      15
       0.050761
       0.059645
                         GFP : CC DE LA VIADENE
    
    
      16
       0.000000
       0.486381
                             GFP : CC DU NEBBIU
    
    
      17
       0.000000
       0.436280
                      GFP : CA AGENTEUIL-BEZONS
    
    
      18
       0.000000
       0.235437
                      GFP : CC COEUR DE SOLOGNE
    
    
      19
       0.021709
       0.051262
               GFP : CC PAYS DE MONTMELIAN CCPM



In [39]:

    
# Biggest property tax rate
_df = df.sort(columns='debt_ratio', ascending=False)
_df[['year', 'debt_ratio', 'name']].head(n=20)









    Out[39]:






  
    
      
      year
      debt_ratio
      name
    
  
  
    
      4826 
       2009
       2.108333
                              GFP : CC AGHJA NOVA
    
    
      3373 
       2008
       1.856364
                GFP : CC DE LA VALLEE DE LA COOLE
    
    
      5115 
       2009
       1.634201
                           GFP : CC ARRATS GIMONE
    
    
      3258 
       2008
       1.517297
                             GFP : CC DE CHEMILLE
    
    
      10119
       2011
       1.505529
                       GFP : CC BASSIN DE LANDRES
    
    
      4184 
       2008
       1.483489
                       GFP : CC VAL VERT DU CLAIN
    
    
      822  
       2007
       1.462203
       GFP : CC DU PAYS DE SAINT-AUBIN-DU-CORMIER
    
    
      739  
       2007
       1.412834
                             GFP : CC VAL DE GERS
    
    
      7822 
       2010
       1.346433
                             GFP : CC PAYS DU DER
    
    
      10017
       2011
       1.338164
                   GFP : CC ENTRE PLAGE ET BOCAGE
    
    
      10342
       2011
       1.308844
                       GFP : CC NOEUX ET ENVIRONS
    
    
      5030 
       2009
       1.292761
                          GFP : CC DU BONNEVALAIS
    
    
      11736
       2012
       1.292339
                    GFP : CC DE CAUSSES ET VEZERE
    
    
      11410
       2012
       1.158621
                   GFP : CC REGION ARCIS-SUR-AUBE
    
    
      4276 
       2008
       1.157179
                     GFP : CA EVRY CENTRE ESSONNE
    
    
      4372 
       2009
       1.060797
                            GFP : CC ISLE MANOIRE
    
    
      2104 
       2007
       1.042221
                     GFP : CA EVRY CENTRE ESSONNE
    
    
      3925 
       2008
       1.036251
                         GFP : CC BOCAGE CENOMANS
    
    
      3069 
       2008
       1.029687
                       GFP : CC PAYS ST MARCELLIN
    
    
      4627 
       2009
       1.025945
                            GFP : CC DES RIVIERES



In [ ]:



In [40]:



In [ ]:

	year	net_profit	staff_costs	financial_costs	debt_repayments	allocation
0	2007	1451000	4684000	367000	211000	2540000
1	2007	2460000	2497000	0	0	4334000
2	2007	364000	2000	0	0	0
3	2007	411000	239000	26000	663000	248000
4	2007	95000	23000	0	0	101000
5	2007	87000	36000	24000	11000	13000
6	2007	1444000	2966000	166000	471000	2172000
7	2007	148000	134000	4000	12000	304000
8	2007	580000	1571000	72000	177000	1171000
9	2007	733000	983000	12000	90000	282000
10	2007	96000	102000	12000	14000	106000
11	2007	510000	303000	21000	52000	157000
12	2007	317000	165000	2000	0	2658000
13	2007	367000	66000	66000	125000	35000
14	2007	86000	27000	0	0	50000
15	2007	298000	47000	24000	16000	161000
16	2007	104000	250000	0	0	90000
17	2007	3933000	10301000	0	0	23165000
18	2007	219000	97000	0	0	233000
19	2007	989000	451000	93000	98000	1730000

	debt_ratio	staff_costs_ratio	name
0	0.043450	0.383275	GFP : CC FAUCIGNY-GLIERES
1	0.000000	0.151673	GFP : CC DES DEUX RIVES DE LA SEINE
2	0.000000	0.000574	GFP : CC LES COTEAUX DE SEINE
3	0.672515	0.232943	GFP : CC LA LOUGE ET TOUCH
4	0.000000	0.039792	GFP : CC SUD MORVAN
5	0.058431	0.060100	GFP : CC DES 2 RIVES DE LA MOSELLE
6	0.066271	0.308573	GFP : CC PAYS PONTCHATEAU SAINT-GILDAS
7	0.014147	0.118479	GFP : CC DU PLATEAU DE LOMMOYE
8	0.047931	0.302406	GFP : CC PRESQU'ILE RHUYS
9	0.020540	0.197946	GFP : CC PORTES ROMILLY
10	0.043253	0.176471	GFP : CC ENTRE LOIRE ET ALLIER
11	0.032531	0.135027	GFP : CC LE MINERVOIS
12	0.001750	0.144357	GFP : CC RHONE-LEZ-PROVENCE
13	0.267606	0.092958	GFP : CC CEVENNE ET MONTAGNE ARDECHOISE
14	0.000000	0.102662	GFP : CC LES GRANDS SITES GORGES ARDECHE
15	0.050761	0.059645	GFP : CC DE LA VIADENE
16	0.000000	0.486381	GFP : CC DU NEBBIU
17	0.000000	0.436280	GFP : CA AGENTEUIL-BEZONS
18	0.000000	0.235437	GFP : CC COEUR DE SOLOGNE
19	0.021709	0.051262	GFP : CC PAYS DE MONTMELIAN CCPM

	year	debt_ratio	name
4826	2009	2.108333	GFP : CC AGHJA NOVA
3373	2008	1.856364	GFP : CC DE LA VALLEE DE LA COOLE
5115	2009	1.634201	GFP : CC ARRATS GIMONE
3258	2008	1.517297	GFP : CC DE CHEMILLE
10119	2011	1.505529	GFP : CC BASSIN DE LANDRES
4184	2008	1.483489	GFP : CC VAL VERT DU CLAIN
822	2007	1.462203	GFP : CC DU PAYS DE SAINT-AUBIN-DU-CORMIER
739	2007	1.412834	GFP : CC VAL DE GERS
7822	2010	1.346433	GFP : CC PAYS DU DER
10017	2011	1.338164	GFP : CC ENTRE PLAGE ET BOCAGE
10342	2011	1.308844	GFP : CC NOEUX ET ENVIRONS
5030	2009	1.292761	GFP : CC DU BONNEVALAIS
11736	2012	1.292339	GFP : CC DE CAUSSES ET VEZERE
11410	2012	1.158621	GFP : CC REGION ARCIS-SUR-AUBE
4276	2008	1.157179	GFP : CA EVRY CENTRE ESSONNE
4372	2009	1.060797	GFP : CC ISLE MANOIRE
2104	2007	1.042221	GFP : CA EVRY CENTRE ESSONNE
3925	2008	1.036251	GFP : CC BOCAGE CENOMANS
3069	2008	1.029687	GFP : CC PAYS ST MARCELLIN
4627	2009	1.025945	GFP : CC DES RIVIERES