Teknisk Tirsdag: Data Cleaning

Tillykke! Du er hermed blevet ansat som Data Scientist for en norsk virksomhed, der arbejder med at rådgive internationale fodboldklubber med hvilke spillere de skal købe.

I dag begynder I på at undersøge det danske spillermarked for potentielle kandidater til de allerstørste klubber i Europa, og som nyudnævnt data scientist er det din opgave at finde de skjulte talenter i Danmark. Du har fået udleveret et datasæt for fodboldspillere i 2018 og du skal lave nogle analyser...


In [1]:
#PURE PYTHON!!!!
from IPython.display import display, Markdown
import numpy as np
import pandas as pd
import os
import re

# path = %pwd
# path += '/fifa-18-demo-player-dataset/CompleteDataset.csv'

# Til Windows
path = '.\\Downloads\\Fifa2018-master\\Fifa2018-master'
path += '\\fifa-18-demo-player-dataset\\CompleteDataset.csv'

Hvordan ser vores originale datasæt ud?

nb! Der kommer en advarsel når dette køres, og dette er måske en ikke dårlig ting.


In [2]:
input_data_frame = pd.DataFrame().from_csv(path=path, encoding='utf-8')
input_data_frame


/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py:2850: DtypeWarning: Columns (23,35) have mixed types. Specify dtype option on import or set low_memory=False.
  if self.run_code(code, result):
Out[2]:
Name Age Photo Nationality Flag Overall Potential Club Club Logo Value ... RB RCB RCM RDM RF RM RS RW RWB ST
0 Cristiano Ronaldo 32 https://cdn.sofifa.org/48/18/players/20801.png Portugal https://cdn.sofifa.org/flags/38.png 94 94 Real Madrid CF https://cdn.sofifa.org/24/18/teams/243.png €95.5M ... 61.0 53.0 82.0 62.0 91.0 89.0 92.0 91.0 66.0 92.0
1 L. Messi 30 https://cdn.sofifa.org/48/18/players/158023.png Argentina https://cdn.sofifa.org/flags/52.png 93 93 FC Barcelona https://cdn.sofifa.org/24/18/teams/241.png €105M ... 57.0 45.0 84.0 59.0 92.0 90.0 88.0 91.0 62.0 88.0
2 Neymar 25 https://cdn.sofifa.org/48/18/players/190871.png Brazil https://cdn.sofifa.org/flags/54.png 92 94 Paris Saint-Germain https://cdn.sofifa.org/24/18/teams/73.png €123M ... 59.0 46.0 79.0 59.0 88.0 87.0 84.0 89.0 64.0 84.0
3 L. Suárez 30 https://cdn.sofifa.org/48/18/players/176580.png Uruguay https://cdn.sofifa.org/flags/60.png 92 92 FC Barcelona https://cdn.sofifa.org/24/18/teams/241.png €97M ... 64.0 58.0 80.0 65.0 88.0 85.0 88.0 87.0 68.0 88.0
4 M. Neuer 31 https://cdn.sofifa.org/48/18/players/167495.png Germany https://cdn.sofifa.org/flags/21.png 92 92 FC Bayern Munich https://cdn.sofifa.org/24/18/teams/21.png €61M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 R. Lewandowski 28 https://cdn.sofifa.org/48/18/players/188545.png Poland https://cdn.sofifa.org/flags/37.png 91 91 FC Bayern Munich https://cdn.sofifa.org/24/18/teams/21.png €92M ... 58.0 57.0 78.0 62.0 87.0 82.0 88.0 84.0 61.0 88.0
6 De Gea 26 https://cdn.sofifa.org/48/18/players/193080.png Spain https://cdn.sofifa.org/flags/45.png 90 92 Manchester United https://cdn.sofifa.org/24/18/teams/11.png €64.5M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 E. Hazard 26 https://cdn.sofifa.org/48/18/players/183277.png Belgium https://cdn.sofifa.org/flags/7.png 90 91 Chelsea https://cdn.sofifa.org/24/18/teams/5.png €90.5M ... 59.0 47.0 81.0 61.0 87.0 87.0 82.0 88.0 64.0 82.0
8 T. Kroos 27 https://cdn.sofifa.org/48/18/players/182521.png Germany https://cdn.sofifa.org/flags/21.png 90 90 Real Madrid CF https://cdn.sofifa.org/24/18/teams/243.png €79M ... 76.0 72.0 87.0 82.0 81.0 81.0 77.0 80.0 78.0 77.0
9 G. Higuaín 29 https://cdn.sofifa.org/48/18/players/167664.png Argentina https://cdn.sofifa.org/flags/52.png 90 90 Juventus https://cdn.sofifa.org/24/18/teams/45.png €77M ... 51.0 46.0 71.0 52.0 84.0 79.0 87.0 82.0 55.0 87.0
10 Sergio Ramos 31 https://cdn.sofifa.org/48/18/players/155862.png Spain https://cdn.sofifa.org/flags/45.png 90 90 Real Madrid CF https://cdn.sofifa.org/24/18/teams/243.png €52M ... 84.0 87.0 74.0 83.0 70.0 71.0 72.0 69.0 81.0 72.0
11 K. De Bruyne 26 https://cdn.sofifa.org/48/18/players/192985.png Belgium https://cdn.sofifa.org/flags/7.png 89 92 Manchester City https://cdn.sofifa.org/24/18/teams/10.png €83M ... 66.0 57.0 84.0 70.0 85.0 85.0 81.0 85.0 71.0 81.0
12 T. Courtois 25 https://cdn.sofifa.org/48/18/players/192119.png Belgium https://cdn.sofifa.org/flags/7.png 89 92 Chelsea https://cdn.sofifa.org/24/18/teams/5.png €59M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 A. Sánchez 28 https://cdn.sofifa.org/48/18/players/184941.png Chile https://cdn.sofifa.org/flags/55.png 89 89 Arsenal https://cdn.sofifa.org/24/18/teams/1.png €67.5M ... 62.0 56.0 79.0 64.0 85.0 85.0 83.0 86.0 66.0 83.0
14 L. Modrić 31 https://cdn.sofifa.org/48/18/players/177003.png Croatia https://cdn.sofifa.org/flags/10.png 89 89 Real Madrid CF https://cdn.sofifa.org/24/18/teams/243.png €57M ... 78.0 72.0 86.0 80.0 83.0 84.0 76.0 83.0 80.0 76.0
15 G. Bale 27 https://cdn.sofifa.org/48/18/players/173731.png Wales https://cdn.sofifa.org/flags/50.png 89 89 Real Madrid CF https://cdn.sofifa.org/24/18/teams/243.png €69.5M ... 72.0 67.0 81.0 71.0 87.0 87.0 87.0 87.0 74.0 87.0
16 S. Agüero 29 https://cdn.sofifa.org/48/18/players/153079.png Argentina https://cdn.sofifa.org/flags/52.png 89 89 Manchester City https://cdn.sofifa.org/24/18/teams/10.png €66.5M ... 52.0 44.0 75.0 54.0 87.0 84.0 86.0 86.0 57.0 86.0
17 G. Chiellini 32 https://cdn.sofifa.org/48/18/players/138956.png Italy https://cdn.sofifa.org/flags/27.png 89 89 Juventus https://cdn.sofifa.org/24/18/teams/45.png €38M ... 78.0 86.0 60.0 76.0 55.0 58.0 59.0 56.0 75.0 59.0
18 G. Buffon 39 https://cdn.sofifa.org/48/18/players/1179.png Italy https://cdn.sofifa.org/flags/27.png 89 89 Juventus https://cdn.sofifa.org/24/18/teams/45.png €4.5M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
19 P. Dybala 23 https://cdn.sofifa.org/48/18/players/211110.png Argentina https://cdn.sofifa.org/flags/52.png 88 93 Juventus https://cdn.sofifa.org/24/18/teams/45.png €79M ... 55.0 43.0 78.0 55.0 86.0 86.0 83.0 87.0 60.0 83.0
20 J. Oblak 24 https://cdn.sofifa.org/48/18/players/200389.png Slovenia https://cdn.sofifa.org/flags/44.png 88 93 Atlético Madrid https://cdn.sofifa.org/24/18/teams/240.png €57M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21 A. Griezmann 26 https://cdn.sofifa.org/48/18/players/194765.png France https://cdn.sofifa.org/flags/18.png 88 91 Atlético Madrid https://cdn.sofifa.org/24/18/teams/240.png €75M ... 56.0 48.0 76.0 57.0 85.0 84.0 85.0 86.0 60.0 85.0
22 Thiago 26 https://cdn.sofifa.org/48/18/players/189509.png Spain https://cdn.sofifa.org/flags/45.png 88 90 FC Bayern Munich https://cdn.sofifa.org/24/18/teams/21.png €70.5M ... 72.0 66.0 85.0 76.0 83.0 83.0 77.0 83.0 75.0 77.0
23 P. Aubameyang 28 https://cdn.sofifa.org/48/18/players/188567.png Gabon https://cdn.sofifa.org/flags/115.png 88 88 Borussia Dortmund https://cdn.sofifa.org/24/18/teams/22.png €61M ... 62.0 52.0 74.0 59.0 84.0 83.0 85.0 84.0 65.0 85.0
24 L. Bonucci 30 https://cdn.sofifa.org/48/18/players/184344.png Italy https://cdn.sofifa.org/flags/27.png 88 88 Milan https://cdn.sofifa.org/24/18/teams/47.png €44M ... 79.0 86.0 75.0 83.0 66.0 66.0 65.0 63.0 76.0 65.0
25 J. Boateng 28 https://cdn.sofifa.org/48/18/players/183907.png Germany https://cdn.sofifa.org/flags/21.png 88 88 FC Bayern Munich https://cdn.sofifa.org/24/18/teams/21.png €48M ... 81.0 85.0 73.0 82.0 66.0 69.0 65.0 65.0 79.0 65.0
26 D. Godín 31 https://cdn.sofifa.org/48/18/players/182493.png Uruguay https://cdn.sofifa.org/flags/60.png 88 88 Atlético Madrid https://cdn.sofifa.org/24/18/teams/240.png €40M ... 79.0 86.0 69.0 80.0 62.0 63.0 64.0 61.0 76.0 64.0
27 M. Hummels 28 https://cdn.sofifa.org/48/18/players/178603.png Germany https://cdn.sofifa.org/flags/21.png 88 88 FC Bayern Munich https://cdn.sofifa.org/24/18/teams/21.png €48M ... 80.0 85.0 77.0 83.0 69.0 70.0 69.0 68.0 78.0 69.0
28 M. Özil 28 https://cdn.sofifa.org/48/18/players/176635.png Germany https://cdn.sofifa.org/flags/21.png 88 88 Arsenal https://cdn.sofifa.org/24/18/teams/1.png €60M ... 52.0 41.0 79.0 57.0 82.0 83.0 76.0 83.0 58.0 76.0
29 H. Lloris 30 https://cdn.sofifa.org/48/18/players/167948.png France https://cdn.sofifa.org/flags/18.png 88 88 Tottenham Hotspur https://cdn.sofifa.org/24/18/teams/18.png €38M ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17951 M. Hurst 21 https://cdn.sofifa.org/48/18/players/221669.png Scotland https://cdn.sofifa.org/flags/42.png 48 58 St. Johnstone FC https://cdn.sofifa.org/24/18/teams/100804.png €40K ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17952 A. Osmanoski 17 https://cdn.sofifa.org/48/18/players/239079.png FYR Macedonia https://cdn.sofifa.org/flags/19.png 48 68 SpVgg Unterhaching https://cdn.sofifa.org/24/18/teams/172.png €60K ... 47.0 46.0 47.0 47.0 47.0 48.0 46.0 47.0 47.0 46.0
17953 K. Cotter 18 https://cdn.sofifa.org/48/18/players/238833.png England https://cdn.sofifa.org/flags/14.png 48 67 Luton Town https://cdn.sofifa.org/24/18/teams/1923.png €60K ... 47.0 44.0 47.0 46.0 45.0 48.0 44.0 46.0 48.0 44.0
17954 T. Robinson 19 https://cdn.sofifa.org/48/18/players/225778.png England https://cdn.sofifa.org/flags/14.png 48 64 Bradford City https://cdn.sofifa.org/24/18/teams/1804.png €60K ... 37.0 32.0 41.0 35.0 45.0 47.0 45.0 47.0 39.0 45.0
17955 R. Hughes 18 https://cdn.sofifa.org/48/18/players/231928.png Scotland https://cdn.sofifa.org/flags/42.png 48 63 Hamilton Academical FC https://cdn.sofifa.org/24/18/teams/184.png €60K ... 47.0 48.0 47.0 48.0 46.0 47.0 44.0 46.0 47.0 44.0
17956 Z. Mohammed 17 https://cdn.sofifa.org/48/18/players/240381.png England https://cdn.sofifa.org/flags/14.png 48 64 Accrington Stanley https://cdn.sofifa.org/24/18/teams/110313.png €50K ... 44.0 47.0 33.0 41.0 32.0 34.0 34.0 33.0 42.0 34.0
17957 D. Peppard 17 https://cdn.sofifa.org/48/18/players/236807.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 61 Bohemian FC https://cdn.sofifa.org/24/18/teams/305.png €50K ... 46.0 45.0 41.0 45.0 39.0 42.0 37.0 40.0 46.0 37.0
17958 C. Rogers 17 https://cdn.sofifa.org/48/18/players/237859.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 59 Bray Wanderers https://cdn.sofifa.org/24/18/teams/838.png €40K ... 46.0 47.0 36.0 43.0 35.0 37.0 37.0 35.0 44.0 37.0
17959 N. McLaughlin 18 https://cdn.sofifa.org/48/18/players/225319.png Scotland https://cdn.sofifa.org/flags/42.png 47 64 Partick Thistle F.C. https://cdn.sofifa.org/24/18/teams/1754.png €60K ... 43.0 40.0 46.0 44.0 47.0 48.0 46.0 47.0 44.0 46.0
17960 L. Kiely 18 https://cdn.sofifa.org/48/18/players/236597.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 69 Shamrock Rovers https://cdn.sofifa.org/24/18/teams/306.png €70K ... 41.0 41.0 45.0 43.0 44.0 45.0 42.0 44.0 41.0 42.0
17961 J. Latibeaudiere 17 https://cdn.sofifa.org/48/18/players/233047.png England https://cdn.sofifa.org/flags/14.png 47 73 Manchester City https://cdn.sofifa.org/24/18/teams/10.png €60K ... 45.0 46.0 35.0 42.0 32.0 35.0 32.0 32.0 43.0 32.0
17962 J. Payne 18 https://cdn.sofifa.org/48/18/players/236425.png England https://cdn.sofifa.org/flags/14.png 47 63 Barnet https://cdn.sofifa.org/24/18/teams/135.png €50K ... 46.0 46.0 35.0 40.0 36.0 39.0 36.0 38.0 45.0 36.0
17963 G. Manley 19 https://cdn.sofifa.org/48/18/players/238217.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 58 Cork City https://cdn.sofifa.org/24/18/teams/422.png €50K ... 46.0 48.0 46.0 48.0 42.0 44.0 41.0 42.0 46.0 41.0
17964 P. Phillips 18 https://cdn.sofifa.org/48/18/players/238985.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 67 Cork City https://cdn.sofifa.org/24/18/teams/422.png €60K ... 47.0 48.0 46.0 48.0 43.0 44.0 41.0 42.0 46.0 41.0
17965 A. Byrne 18 https://cdn.sofifa.org/48/18/players/238219.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 61 Cork City https://cdn.sofifa.org/24/18/teams/422.png €60K ... 46.0 43.0 46.0 46.0 46.0 47.0 44.0 45.0 46.0 44.0
17966 K. Fujikawa 18 https://cdn.sofifa.org/48/18/players/238477.png Japan https://cdn.sofifa.org/flags/163.png 47 67 Júbilo Iwata https://cdn.sofifa.org/24/18/teams/101144.png €60K ... 46.0 47.0 46.0 47.0 43.0 45.0 40.0 44.0 46.0 40.0
17967 K. Egan 19 https://cdn.sofifa.org/48/18/players/231824.png England https://cdn.sofifa.org/flags/14.png 47 67 Exeter City https://cdn.sofifa.org/24/18/teams/143.png €60K ... 46.0 46.0 37.0 41.0 38.0 42.0 37.0 42.0 46.0 37.0
17968 T. Brownsword 17 https://cdn.sofifa.org/48/18/players/237974.png England https://cdn.sofifa.org/flags/14.png 47 68 Morecambe https://cdn.sofifa.org/24/18/teams/357.png €60K ... 45.0 46.0 32.0 40.0 32.0 34.0 33.0 32.0 43.0 33.0
17969 F. Prohart 18 https://cdn.sofifa.org/48/18/players/236954.png Austria https://cdn.sofifa.org/flags/4.png 47 67 Wolfsberger AC https://cdn.sofifa.org/24/18/teams/111822.png €60K ... 33.0 27.0 41.0 31.0 48.0 47.0 46.0 49.0 35.0 46.0
17970 A. Kilgour 19 https://cdn.sofifa.org/48/18/players/231107.png England https://cdn.sofifa.org/flags/14.png 47 56 Bristol Rovers https://cdn.sofifa.org/24/18/teams/1962.png €40K ... 43.0 46.0 38.0 43.0 37.0 38.0 38.0 37.0 42.0 38.0
17971 R. White 18 https://cdn.sofifa.org/48/18/players/240325.png England https://cdn.sofifa.org/flags/14.png 47 65 Bolton Wanderers https://cdn.sofifa.org/24/18/teams/4.png €60K ... 33.0 32.0 42.0 33.0 49.0 46.0 52.0 47.0 34.0 52.0
17972 A. Conway 19 https://cdn.sofifa.org/48/18/players/238306.png Republic of Ireland https://cdn.sofifa.org/flags/25.png 47 63 Galway United https://cdn.sofifa.org/24/18/teams/1571.png €60K ... 46.0 45.0 41.0 43.0 42.0 44.0 42.0 44.0 46.0 42.0
17973 T. Sawyer 18 https://cdn.sofifa.org/48/18/players/240403.png England https://cdn.sofifa.org/flags/14.png 46 58 Grimsby Town https://cdn.sofifa.org/24/18/teams/92.png €50K ... 45.0 42.0 45.0 43.0 46.0 47.0 45.0 47.0 46.0 45.0
17974 J. Keeble 18 https://cdn.sofifa.org/48/18/players/240404.png England https://cdn.sofifa.org/flags/14.png 46 56 Grimsby Town https://cdn.sofifa.org/24/18/teams/92.png €40K ... 46.0 45.0 34.0 41.0 33.0 35.0 33.0 34.0 44.0 33.0
17975 T. Käßemodel 28 https://cdn.sofifa.org/48/18/players/235352.png Germany https://cdn.sofifa.org/flags/21.png 46 46 FC Erzgebirge Aue https://cdn.sofifa.org/24/18/teams/506.png €30K ... 37.0 38.0 45.0 42.0 42.0 42.0 41.0 41.0 38.0 41.0
17976 A. Kelsey 17 https://cdn.sofifa.org/48/18/players/237463.png England https://cdn.sofifa.org/flags/14.png 46 63 Scunthorpe United https://cdn.sofifa.org/24/18/teams/1949.png €50K ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17977 B. Richardson 47 https://cdn.sofifa.org/48/18/players/11728.png England https://cdn.sofifa.org/flags/14.png 46 46 Wycombe Wanderers https://cdn.sofifa.org/24/18/teams/1933.png €0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17978 J. Young 17 https://cdn.sofifa.org/48/18/players/231381.png Scotland https://cdn.sofifa.org/flags/42.png 46 61 Swindon Town https://cdn.sofifa.org/24/18/teams/1934.png €60K ... 31.0 28.0 38.0 29.0 45.0 42.0 45.0 44.0 32.0 45.0
17979 J. Lundstram 18 https://cdn.sofifa.org/48/18/players/238813.png England https://cdn.sofifa.org/flags/14.png 46 64 Crewe Alexandra https://cdn.sofifa.org/24/18/teams/121.png €60K ... 47.0 46.0 45.0 47.0 43.0 45.0 41.0 44.0 46.0 41.0
17980 L. Sackey 18 https://cdn.sofifa.org/48/18/players/238308.png Ghana https://cdn.sofifa.org/flags/117.png 46 64 Scunthorpe United https://cdn.sofifa.org/24/18/teams/1949.png €50K ... 40.0 45.0 30.0 38.0 29.0 30.0 31.0 29.0 38.0 31.0

17981 rows × 74 columns

Resning af data

Du konstatere hurtigt at dit datasæt er noget værre skrammel! Derfor er du, som det første, nødt til at rense data for at forsøge at få noget mening ud af dine analyser.

Til dette har vi lavet følgende metoder, som henter spillerne ind fra en csv fil.


In [3]:
def clean_raw_data(data_frame, *args):
    """
    Denne metode fjerner uønskede kolonner samt indsætter 0 på målmændendes ikke-målmænd attributter
    @input: data_frame: Det datasæt vi ønsker at fjerne unønsket kolonner i.
    @input: *args: De uønsket kolonner skrives som streng argumenter fx. 'col_x', 'col_y', '...', etc.
    @output: En dataframe hvor vi kun har de ønsket kolonner tilbage.
    """
    false_cols = [i for i in args if i not in data_frame.columns]
    if len(false_cols) != 0:
        print('The folloing column(s) are not in the Dataframe: '+', '.join(false_cols))
    return data_frame[[i for i in data_frame.columns if i not in args]]

Opgave 0: Rens dit data!

Som den aller første opgave, vil din leder gerne have at du fjerner uønsket kolonner i dit datasæt, da de er irrelevante. Heldigvis, har nogle af dine kollegaer lavet en metode til at fjerne uønsket kolonner, så det Du skal gøre er, at identificere de kolonner, som er irelevante for denne analyse.

HINT: Læs metodedokumentationen for at finde ud af hvordan man giver kolonnenavne til metoden.


In [4]:
df = clean_raw_data(input_data_frame,'**INSÆT KOLONNENAVNE HER SOM STRENG ARGUMENTER!**')
df


Out[4]:
Name Age Nationality Overall Potential Club Value Wage Special Acceleration ... RB RCB RCM RDM RF RM RS RW RWB ST
0 Cristiano Ronaldo 32 Portugal 94 94 Real Madrid CF €95.5M €565K 2228 89 ... 61.0 53.0 82.0 62.0 91.0 89.0 92.0 91.0 66.0 92.0
1 L. Messi 30 Argentina 93 93 FC Barcelona €105M €565K 2154 92 ... 57.0 45.0 84.0 59.0 92.0 90.0 88.0 91.0 62.0 88.0
2 Neymar 25 Brazil 92 94 Paris Saint-Germain €123M €280K 2100 94 ... 59.0 46.0 79.0 59.0 88.0 87.0 84.0 89.0 64.0 84.0
3 L. Suárez 30 Uruguay 92 92 FC Barcelona €97M €510K 2291 88 ... 64.0 58.0 80.0 65.0 88.0 85.0 88.0 87.0 68.0 88.0
4 M. Neuer 31 Germany 92 92 FC Bayern Munich €61M €230K 1493 58 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
5 R. Lewandowski 28 Poland 91 91 FC Bayern Munich €92M €355K 2143 79 ... 58.0 57.0 78.0 62.0 87.0 82.0 88.0 84.0 61.0 88.0
6 De Gea 26 Spain 90 92 Manchester United €64.5M €215K 1458 57 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 E. Hazard 26 Belgium 90 91 Chelsea €90.5M €295K 2096 93 ... 59.0 47.0 81.0 61.0 87.0 87.0 82.0 88.0 64.0 82.0
8 T. Kroos 27 Germany 90 90 Real Madrid CF €79M €340K 2165 60 ... 76.0 72.0 87.0 82.0 81.0 81.0 77.0 80.0 78.0 77.0
9 G. Higuaín 29 Argentina 90 90 Juventus €77M €275K 1961 78 ... 51.0 46.0 71.0 52.0 84.0 79.0 87.0 82.0 55.0 87.0
10 Sergio Ramos 31 Spain 90 90 Real Madrid CF €52M €310K 2153 75 ... 84.0 87.0 74.0 83.0 70.0 71.0 72.0 69.0 81.0 72.0
11 K. De Bruyne 26 Belgium 89 92 Manchester City €83M €285K 2162 76 ... 66.0 57.0 84.0 70.0 85.0 85.0 81.0 85.0 71.0 81.0
12 T. Courtois 25 Belgium 89 92 Chelsea €59M €190K 1282 46 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 A. Sánchez 28 Chile 89 89 Arsenal €67.5M €265K 2181 88 ... 62.0 56.0 79.0 64.0 85.0 85.0 83.0 86.0 66.0 83.0
14 L. Modrić 31 Croatia 89 89 Real Madrid CF €57M €340K 2228 75 ... 78.0 72.0 86.0 80.0 83.0 84.0 76.0 83.0 80.0 76.0
15 G. Bale 27 Wales 89 89 Real Madrid CF €69.5M €370K 2263 93 ... 72.0 67.0 81.0 71.0 87.0 87.0 87.0 87.0 74.0 87.0
16 S. Agüero 29 Argentina 89 89 Manchester City €66.5M €325K 2074 90 ... 52.0 44.0 75.0 54.0 87.0 84.0 86.0 86.0 57.0 86.0
17 G. Chiellini 32 Italy 89 89 Juventus €38M €225K 1867 68 ... 78.0 86.0 60.0 76.0 55.0 58.0 59.0 56.0 75.0 59.0
18 G. Buffon 39 Italy 89 89 Juventus €4.5M €110K 1335 49 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
19 P. Dybala 23 Argentina 88 93 Juventus €79M €215K 2063 88 ... 55.0 43.0 78.0 55.0 86.0 86.0 83.0 87.0 60.0 83.0
20 J. Oblak 24 Slovenia 88 93 Atlético Madrid €57M €82K 1290 43 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
21 A. Griezmann 26 France 88 91 Atlético Madrid €75M €150K 2104 87 ... 56.0 48.0 76.0 57.0 85.0 84.0 85.0 86.0 60.0 85.0
22 Thiago 26 Spain 88 90 FC Bayern Munich €70.5M €225K 2185 77 ... 72.0 66.0 85.0 76.0 83.0 83.0 77.0 83.0 75.0 77.0
23 P. Aubameyang 28 Gabon 88 88 Borussia Dortmund €61M €165K 2078 95 ... 62.0 52.0 74.0 59.0 84.0 83.0 85.0 84.0 65.0 85.0
24 L. Bonucci 30 Italy 88 88 Milan €44M €210K 1995 62 ... 79.0 86.0 75.0 83.0 66.0 66.0 65.0 63.0 76.0 65.0
25 J. Boateng 28 Germany 88 88 FC Bayern Munich €48M €215K 1989 72 ... 81.0 85.0 73.0 82.0 66.0 69.0 65.0 65.0 79.0 65.0
26 D. Godín 31 Uruguay 88 88 Atlético Madrid €40M €125K 1930 62 ... 79.0 86.0 69.0 80.0 62.0 63.0 64.0 61.0 76.0 64.0
27 M. Hummels 28 Germany 88 88 FC Bayern Munich €48M €215K 2038 62 ... 80.0 85.0 77.0 83.0 69.0 70.0 69.0 68.0 78.0 69.0
28 M. Özil 28 Germany 88 88 Arsenal €60M €265K 1927 75 ... 52.0 41.0 79.0 57.0 82.0 83.0 76.0 83.0 58.0 76.0
29 H. Lloris 30 France 88 88 Tottenham Hotspur €38M €165K 1318 65 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17951 M. Hurst 21 Scotland 48 58 St. Johnstone FC €40K €1K 991 40 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17952 A. Osmanoski 17 FYR Macedonia 48 68 SpVgg Unterhaching €60K €1K 1377 68 ... 47.0 46.0 47.0 47.0 47.0 48.0 46.0 47.0 47.0 46.0
17953 K. Cotter 18 England 48 67 Luton Town €60K €1K 1329 67 ... 47.0 44.0 47.0 46.0 45.0 48.0 44.0 46.0 48.0 44.0
17954 T. Robinson 19 England 48 64 Bradford City €60K €1K 1257 69 ... 37.0 32.0 41.0 35.0 45.0 47.0 45.0 47.0 39.0 45.0
17955 R. Hughes 18 Scotland 48 63 Hamilton Academical FC €60K €1K 1330 59 ... 47.0 48.0 47.0 48.0 46.0 47.0 44.0 46.0 47.0 44.0
17956 Z. Mohammed 17 England 48 64 Accrington Stanley €50K €1K 1167 62 ... 44.0 47.0 33.0 41.0 32.0 34.0 34.0 33.0 42.0 34.0
17957 D. Peppard 17 Republic of Ireland 47 61 Bohemian FC €50K €1K 1199 56 ... 46.0 45.0 41.0 45.0 39.0 42.0 37.0 40.0 46.0 37.0
17958 C. Rogers 17 Republic of Ireland 47 59 Bray Wanderers €40K €1K 1189 60 ... 46.0 47.0 36.0 43.0 35.0 37.0 37.0 35.0 44.0 37.0
17959 N. McLaughlin 18 Scotland 47 64 Partick Thistle F.C. €60K €1K 1330 69 ... 43.0 40.0 46.0 44.0 47.0 48.0 46.0 47.0 44.0 46.0
17960 L. Kiely 18 Republic of Ireland 47 69 Shamrock Rovers €70K €1K 1290 43 ... 41.0 41.0 45.0 43.0 44.0 45.0 42.0 44.0 41.0 42.0
17961 J. Latibeaudiere 17 England 47 73 Manchester City €60K €5K 1132 57 ... 45.0 46.0 35.0 42.0 32.0 35.0 32.0 32.0 43.0 32.0
17962 J. Payne 18 England 47 63 Barnet €50K €1K 1219 65 ... 46.0 46.0 35.0 40.0 36.0 39.0 36.0 38.0 45.0 36.0
17963 G. Manley 19 Republic of Ireland 47 58 Cork City €50K €1K 1298 57 ... 46.0 48.0 46.0 48.0 42.0 44.0 41.0 42.0 46.0 41.0
17964 P. Phillips 18 Republic of Ireland 47 67 Cork City €60K €1K 1297 56 ... 47.0 48.0 46.0 48.0 43.0 44.0 41.0 42.0 46.0 41.0
17965 A. Byrne 18 Republic of Ireland 47 61 Cork City €60K €1K 1346 58 ... 46.0 43.0 46.0 46.0 46.0 47.0 44.0 45.0 46.0 44.0
17966 K. Fujikawa 18 Japan 47 67 Júbilo Iwata €60K €1K 1317 59 ... 46.0 47.0 46.0 47.0 43.0 45.0 40.0 44.0 46.0 40.0
17967 K. Egan 19 England 47 67 Exeter City €60K €1K 1225 63 ... 46.0 46.0 37.0 41.0 38.0 42.0 37.0 42.0 46.0 37.0
17968 T. Brownsword 17 England 47 68 Morecambe €60K €1K 1143 55 ... 45.0 46.0 32.0 40.0 32.0 34.0 33.0 32.0 43.0 33.0
17969 F. Prohart 18 Austria 47 67 Wolfsberger AC €60K €1K 1238 62 ... 33.0 27.0 41.0 31.0 48.0 47.0 46.0 49.0 35.0 46.0
17970 A. Kilgour 19 England 47 56 Bristol Rovers €40K €1K 1208 52 ... 43.0 46.0 38.0 43.0 37.0 38.0 38.0 37.0 42.0 38.0
17971 R. White 18 England 47 65 Bolton Wanderers €60K €2K 1265 51 ... 33.0 32.0 42.0 33.0 49.0 46.0 52.0 47.0 34.0 52.0
17972 A. Conway 19 Republic of Ireland 47 63 Galway United €60K €1K 1314 60 ... 46.0 45.0 41.0 43.0 42.0 44.0 42.0 44.0 46.0 42.0
17973 T. Sawyer 18 England 46 58 Grimsby Town €50K €1K 1267 65 ... 45.0 42.0 45.0 43.0 46.0 47.0 45.0 47.0 46.0 45.0
17974 J. Keeble 18 England 46 56 Grimsby Town €40K €1K 1105 66 ... 46.0 45.0 34.0 41.0 33.0 35.0 33.0 34.0 44.0 33.0
17975 T. Käßemodel 28 Germany 46 46 FC Erzgebirge Aue €30K €1K 1174 25 ... 37.0 38.0 45.0 42.0 42.0 42.0 41.0 41.0 38.0 41.0
17976 A. Kelsey 17 England 46 63 Scunthorpe United €50K €1K 755 24 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17977 B. Richardson 47 England 46 46 Wycombe Wanderers €0 €1K 832 25 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
17978 J. Young 17 Scotland 46 61 Swindon Town €60K €1K 1194 66 ... 31.0 28.0 38.0 29.0 45.0 42.0 45.0 44.0 32.0 45.0
17979 J. Lundstram 18 England 46 64 Crewe Alexandra €60K €1K 1302 57 ... 47.0 46.0 45.0 47.0 43.0 45.0 41.0 44.0 46.0 41.0
17980 L. Sackey 18 Ghana 46 64 Scunthorpe United €50K €1K 1031 48 ... 40.0 45.0 30.0 38.0 29.0 30.0 31.0 29.0 38.0 31.0

17981 rows × 71 columns

Hvilke datatyper har vi i vores datasæt?

Det næste vigtige apsekt i en renselsesprocess, er at undersøge hvilke datatyper som vores datasæt indeholder, og om nogle felter er blanke dvs. er None, Null eller NaN. Til dette kan man i Pandas bruge følgende simple kommandoer.


In [5]:
g = df.columns.to_series().groupby(df.dtypes).groups
d = {key.name: list(val) for key, val in g.items()}
for navn, antal, dtype in list(zip(df.columns,df.count().tolist(), df.dtypes.tolist())):
    print('Kolonnenavn: {:20s} antal fyldte felter: {:<9.0f} datatype: {}'.format(navn, antal, dtype))


Kolonnenavn: Name                 antal fyldte felter: 17981     datatype: object
Kolonnenavn: Age                  antal fyldte felter: 17981     datatype: int64
Kolonnenavn: Nationality          antal fyldte felter: 17981     datatype: object
Kolonnenavn: Overall              antal fyldte felter: 17981     datatype: int64
Kolonnenavn: Potential            antal fyldte felter: 17981     datatype: int64
Kolonnenavn: Club                 antal fyldte felter: 17733     datatype: object
Kolonnenavn: Value                antal fyldte felter: 17981     datatype: object
Kolonnenavn: Wage                 antal fyldte felter: 17981     datatype: object
Kolonnenavn: Special              antal fyldte felter: 17981     datatype: int64
Kolonnenavn: Acceleration         antal fyldte felter: 17981     datatype: object
Kolonnenavn: Aggression           antal fyldte felter: 17981     datatype: object
Kolonnenavn: Agility              antal fyldte felter: 17981     datatype: object
Kolonnenavn: Balance              antal fyldte felter: 17981     datatype: object
Kolonnenavn: Ball control         antal fyldte felter: 17981     datatype: object
Kolonnenavn: Composure            antal fyldte felter: 17981     datatype: object
Kolonnenavn: Crossing             antal fyldte felter: 17981     datatype: object
Kolonnenavn: Curve                antal fyldte felter: 17981     datatype: object
Kolonnenavn: Dribbling            antal fyldte felter: 17981     datatype: object
Kolonnenavn: Finishing            antal fyldte felter: 17981     datatype: object
Kolonnenavn: Free kick accuracy   antal fyldte felter: 17981     datatype: object
Kolonnenavn: GK diving            antal fyldte felter: 17981     datatype: object
Kolonnenavn: GK handling          antal fyldte felter: 17981     datatype: object
Kolonnenavn: GK kicking           antal fyldte felter: 17981     datatype: object
Kolonnenavn: GK positioning       antal fyldte felter: 17981     datatype: object
Kolonnenavn: GK reflexes          antal fyldte felter: 17981     datatype: object
Kolonnenavn: Heading accuracy     antal fyldte felter: 17981     datatype: object
Kolonnenavn: Interceptions        antal fyldte felter: 17981     datatype: object
Kolonnenavn: Jumping              antal fyldte felter: 17981     datatype: object
Kolonnenavn: Long passing         antal fyldte felter: 17981     datatype: object
Kolonnenavn: Long shots           antal fyldte felter: 17981     datatype: object
Kolonnenavn: Marking              antal fyldte felter: 17981     datatype: object
Kolonnenavn: Penalties            antal fyldte felter: 17981     datatype: object
Kolonnenavn: Positioning          antal fyldte felter: 17981     datatype: object
Kolonnenavn: Reactions            antal fyldte felter: 17981     datatype: object
Kolonnenavn: Short passing        antal fyldte felter: 17981     datatype: object
Kolonnenavn: Shot power           antal fyldte felter: 17981     datatype: object
Kolonnenavn: Sliding tackle       antal fyldte felter: 17981     datatype: object
Kolonnenavn: Sprint speed         antal fyldte felter: 17981     datatype: object
Kolonnenavn: Stamina              antal fyldte felter: 17981     datatype: object
Kolonnenavn: Standing tackle      antal fyldte felter: 17981     datatype: object
Kolonnenavn: Strength             antal fyldte felter: 17981     datatype: object
Kolonnenavn: Vision               antal fyldte felter: 17981     datatype: object
Kolonnenavn: Volleys              antal fyldte felter: 17981     datatype: object
Kolonnenavn: CAM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: CB                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: CDM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: CF                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: CM                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: ID                   antal fyldte felter: 17981     datatype: int64
Kolonnenavn: LAM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LB                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LCB                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LCM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LDM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LF                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LM                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LS                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LW                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: LWB                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: Preferred Positions  antal fyldte felter: 17981     datatype: object
Kolonnenavn: RAM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RB                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RCB                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RCM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RDM                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RF                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RM                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RS                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RW                   antal fyldte felter: 15952     datatype: float64
Kolonnenavn: RWB                  antal fyldte felter: 15952     datatype: float64
Kolonnenavn: ST                   antal fyldte felter: 15952     datatype: float64

Nogle vigtige ting at tage med fra denne analyse er:

  • Vi kan med stor sikkerhed sige at der: {{len(df)}} spiller i vores datasæt, men der er kolonner som kun har {{df.CAM.count()}}.
  • Vi har en masse felter som tilhører float64 og int64, hvilket er godt, men følgende kolonner tilhører klassen object: {{', '.join(d['object'])}}

Dette er ikke så godt, da vi kunne være interesseret i at bruge mange af kolonnerne. Dette må vi lige rette op på. Første skridt er at identificere alle typer der er i hver enkel af object kolonnerne. En simple, men effektiv måde er at "undersøge" om elementet i kolonnen er et tal eller en streng.


In [6]:
def is_number_or_string(x):
    try:
        float(x)
        return 'number'
    except ValueError as va: 
        return 'string'
    
df_string_float = df.applymap(is_number_or_string)
dict_types = []
for name in d['object']:
    test_df = df_string_float.groupby(name)[name].count()
    dict_types = dict_types + list(zip([name]*2,test_df.keys(),test_df.values))

In [7]:
data_frame_types = pd.DataFrame(dict_types,columns=['name','dtypes','count'])
list_of_types = (data_frame_types
                 .pivot(index='name', columns='dtypes', values='count')
                 .reset_index()
                 .fillna(0)
                 .sort_values(['number','string'],ascending=False))
list_of_types


Out[7]:
dtypes name number string
14 GK kicking 17962.0 19.0
12 GK diving 17955.0 26.0
15 GK positioning 17955.0 26.0
13 GK handling 17954.0 27.0
16 GK reflexes 17952.0 29.0
25 Penalties 17951.0 30.0
38 Volleys 17940.0 41.0
11 Free kick accuracy 17932.0 49.0
3 Balance 17924.0 57.0
1 Aggression 17913.0 68.0
19 Jumping 17911.0 70.0
2 Agility 17910.0 71.0
8 Curve 17908.0 73.0
30 Shot power 17908.0 73.0
17 Heading accuracy 17906.0 75.0
21 Long shots 17898.0 83.0
0 Acceleration 17897.0 84.0
6 Composure 17887.0 94.0
26 Positioning 17886.0 95.0
31 Sliding tackle 17886.0 95.0
7 Crossing 17885.0 96.0
18 Interceptions 17881.0 100.0
35 Strength 17877.0 104.0
37 Vision 17874.0 107.0
33 Stamina 17873.0 108.0
22 Marking 17869.0 112.0
10 Finishing 17867.0 114.0
32 Sprint speed 17867.0 114.0
28 Reactions 17866.0 115.0
20 Long passing 17860.0 121.0
34 Standing tackle 17857.0 124.0
9 Dribbling 17850.0 131.0
4 Ball control 17840.0 141.0
29 Short passing 17832.0 149.0
5 Club 248.0 17733.0
23 Name 0.0 17981.0
24 Nationality 0.0 17981.0
27 Preferred Positions 0.0 17981.0
36 Value 0.0 17981.0
39 Wage 0.0 17981.0

Ok - rigtig mange af de kolonner som repræsentere de fysiske egenskaber har skrald i sig. Det må vi lige undersøge nærmere


In [8]:
def contains_not_number(x):
    matched = re.findall(r'[^\d\. ]+',str(x),re.IGNORECASE)
    if len(matched) != 0:
        return x

In [9]:
mixed_type_cols = list(list_of_types.loc[list_of_types['number'] != 0.0].name)
for col in mixed_type_cols:
     print((col, list(filter(lambda x: contains_not_number(x) ,df[col].unique()))))


('GK kicking', ['73-1', '68-2', '67+4', '65+1', '61-3', '63-7', '59-1', '60+2', '62+2', '60-1', '61+2', '60+1', '54-1', '65+4', '55+2', '68+8', '57+2', '55-1', '56+4'])
('GK diving', ['81-2', '76+1', '76-1', '75+1', '78+3', '72-1', '75+4', '70-2', '73+2', '71-2', '65+2', '68-1', '67+2', '63+2', '62-1', '66+1', '63+1', '64-3', '62+1', '64+5', '54-3', '56+2', '55+5', '55+4'])
('GK positioning', ['71-2', '69+1', '66-2', '69+2', '64+1', '65+4', '60+3', '66-1', '65+1', '63-1', '70+2', '62-1', '64+2', '62+4', '61-1', '60+1', '58+2', '58+4', '59+2', '51+3', '45-1'])
('GK handling', ['78-2', '72-1', '78-1', '67-1', '69+1', '75-1', '73+1', '66+3', '65+1', '63+2', '64+3', '66+2', '63-1', '65+3', '59+3', '58-1', '60+2', '60-1', '57+2', '52+2', '56+4', '53-1', '47-1', '52+3', '55+2'])
('GK reflexes', ['86-2', '83-1', '85-1', '79+1', '74+1', '81-1', '83+3', '78+2', '73-1', '75+2', '70+1', '70-1', '69+2', '67+2', '62+3', '68+2', '65+1', '65-3', '65-1', '61-3', '67+8', '60+1', '55+5', '56-4', '53+3', '55+3', '57+5'])
('Penalties', ['81+7', '65+2', '70-5', '64+6', '70+4', '67+11', '55+1', '70+2', '70+3', '36-10', '53+2', '67+15', '69+2', '61+5', '51+1', '58+13', '51+5', '56+7', '66+2', '58-4', '69-1', '40-1', '49-1', '61-6', '52-2', '60-7', '60-3', '58+5', '46-1', '61-1'])
('Volleys', ['70+1', '69+3', '72+1', '71+1', '69+4', '66+1', '68+2', '64+1', '69+1', '61+5', '62-4', '49+2', '39+4', '59+1', '60+2', '71+8', '51+1', '49+6', '63+5', '32-1', '52+8', '65-2', '53-1', '62+4', '59+2', '40-12', '57+1', '55-4', '57+2', '29-1', '54+10', '51-1', '56+6', '51+7', '33-1', '52-1', '15+8'])
('Free kick accuracy', ['81+1', '66-5', '73+1', '57+5', '60-4', '56+4', '60+8', '53+11', '63-7', '77-3', '58+2', '69+1', '52+10', '62-3', '69-2', '65+1', '70+5', '39+2', '58+6', '62-4', '65+5', '61+10', '71+4', '56+11', '64+1', '39+10', '55+21', '64+8', '70+3', '68+4', '70+17', '69+12', '65+24', '52+9', '70+21', '39-9', '66+30', '64-3', '70+4', '67-1', '39-6', '65+29', '56-2', '64-2', '36+4', '53+1', '32-1', '45-1'])
('Balance', ['90+2', '71+2', '85+2', '89+2', '65+7', '85+1', '47-3', '70+3', '72+6', '80+6', '64+1', '64+4', '52-2', '61-1', '73+4', '70+7', '66+6', '33+4', '53+1', '66+2', '63+2', '82+2', '66+4', '69+3', '80-6', '56-7', '67-2', '49+2', '64+2', '86-3', '62+1', '73-1', '72-2', '63+5', '81-1', '53-13', '50-5', '70+1', '55-3', '70+2', '73+5', '81+1', '65+1', '79-1', '75+1', '77+2', '63-2', '60+3', '68-4', '73-3', '58-1', '71+1'])
('Aggression', ['58-10', '65+10', '57+5', '66+7', '77+5', '68+1', '68+3', '87+1', '82+1', '76+3', '72+3', '80+3', '65-2', '72+5', '42+7', '78-3', '74+7', '82+2', '81+2', '70-1', '75+1', '76+7', '67+4', '66+3', '78+3', '60+12', '71+1', '65-1', '69+10', '74-6', '23+3', '57+1', '67+6', '67+39', '67-2', '53+14', '33+2', '51+2', '56+1', '65+2', '72-2', '66-2', '82+10', '78+17', '72+14', '59+1', '64+3', '58-1', '53+4', '49+4', '67-3', '65+4', '59-7', '61+1', '68-3', '85-2', '53+5', '51+1', '58+7', '48+2', '57+9', '40+2', '30-1', '61+2', '60+8', '46+1'])
('Jumping', ['74+1', '76-4', '67+2', '65+2', '73-8', '59+1', '77-4', '78+1', '86+1', '59+2', '80-2', '68+1', '76+4', '68-1', '81+2', '76-5', '72+4', '63+2', '67-2', '75-6', '73-3', '70+14', '70-3', '66-1', '64+4', '82+6', '75-4', '74+4', '73-4', '78-2', '72-1', '68+7', '60+1', '77-2', '79+2', '64+2', '67+8', '74+7', '65+1', '68+6', '64-4', '59-1', '76-3', '66-3', '69+7', '63+1', '77+2', '57+4', '62+1', '70-2', '71+2', '67+16', '71+9', '87+2', '66+2', '76-2', '46+2', '70+1', '68+2', '39-2', '60+3', '73+5', '66+5', '63+7', '52-2', '60-2'])
('Agility', ['60+6', '78+1', '58-2', '71+1', '70+1', '69+1', '68+2', '74+4', '64-1', '75-1', '64-2', '43-3', '84+1', '58+4', '68-2', '77-8', '76-3', '50-8', '58-4', '70+14', '70-2', '70+3', '81-2', '63-2', '65+4', '55-4', '65+1', '66+3', '48-3', '46+13', '67+8', '83+6', '68+1', '62-3', '83+3', '85+2', '82-8', '82+3', '61-3', '72-1', '83+1', '66-3', '62+1', '67+2', '79+2', '55+1', '71-4', '72+1', '59+4', '67-2', '60+7', '36+2', '62+3', '48+3', '35-2', '68+10', '53+2', '49+1', '63+3', '75+1', '50-1', '77+2', '46+1'])
('Curve', ['81+2', '77+2', '78+1', '60+7', '75-2', '74+2', '71+1', '76+1', '68+4', '39+7', '67+3', '79+1', '73-2', '75+10', '61+2', '67+4', '59+24', '70+1', '58+10', '72-3', '69+7', '65+3', '64+1', '61+5', '52-6', '66+3', '73+2', '73+1', '52+4', '39+8', '53+1', '64+2', '59+1', '68+2', '72+2', '58+3', '62+4', '66+5', '63+11', '71+3', '62+1', '63-2', '69+15', '55+22', '57+4', '58+21', '69+8', '68-3', '50-3', '63+19', '43+3', '45+5', '57-1', '64+9', '34+3', '37-1', '42-3', '28+5', '36+6', '56+2', '53+10', '46+12', '31-1', '56+1', '45-1', '41+5', '27+2'])
('Shot power', ['81+1', '73+3', '85+6', '83+3', '77+3', '78-1', '75+4', '74-1', '70+28', '73+2', '70+13', '75+3', '73+1', '68-1', '58+9', '66+1', '80+6', '63+2', '73-5', '56+2', '63-3', '64+29', '79-3', '67+2', '76+1', '74-2', '70+1', '54+1', '71-6', '71+4', '68+2', '62-1', '71+1', '64+10', '63+1', '67+1', '69+2', '58+5', '75+1', '60+5', '65+2', '69+4', '57+3', '81-4', '74+2', '65+3', '55+1', '64-1', '55-4', '43-4', '66-2', '56-1', '61+3', '57-1', '60-2', '58-3', '52+3', '67+10', '50+4', '54+2', '43+10', '45-1', '36+13', '56-2', '21+1'])
('Heading accuracy', ['85-1', '73+1', '76+2', '74+1', '72+3', '78+1', '78-1', '75+2', '75-1', '65+2', '68-7', '42-7', '69+2', '70+2', '56+2', '46+8', '74+2', '64+4', '72-2', '72+2', '60+2', '64-1', '55+7', '59+2', '68-4', '43+3', '69-1', '74-2', '61+1', '61+2', '63+4', '57+5', '71-1', '59-5', '65-1', '59-1', '62-1', '72+1', '62+3', '64-3', '63-3', '56-6', '53+6', '64+1', '60+4', '62+4', '65+1', '60-5', '50-10', '54+3', '48-2', '69-3', '63+1', '64+2', '52+5', '57+2', '37+2', '57-1', '60-1', '70+4', '48+1', '43-1', '60+6', '52-11', '47+7', '43+2', '41+2'])
('Long shots', ['80+3', '73+2', '72+2', '79+3', '62+2', '74+1', '73+1', '78+2', '64+2', '73-1', '66+2', '60+6', '70+5', '77-2', '68-2', '65+1', '69+1', '70-4', '67-1', '68-4', '55+5', '64+3', '55+25', '75+1', '52-3', '63+3', '68+1', '75+6', '66+4', '43+1', '53-14', '52+1', '72+5', '63+1', '59+2', '59-1', '69+4', '66+1', '53+9', '64-1', '59+1', '61+3', '54+8', '71+2', '62-1', '62+1', '60+7', '61+1', '41-4', '58+3', '66-2', '43-1', '60+4', '38+10', '59+20', '60-2', '63+10', '27+3', '42+20', '56+5', '57+1', '23-1', '50+1', '39-1', '35+5', '47+3', '17+10'])
('Acceleration', ['70+9', '80+1', '49-1', '67+2', '79-2', '65-2', '91-2', '74-3', '75+1', '41-6', '74+1', '70+3', '75+5', '74+2', '71+2', '68+1', '71+4', '89-2', '58-10', '78+1', '86+1', '66-1', '66+1', '74+4', '71-3', '80+2', '64-2', '57-4', '78+3', '73+9', '82-3', '68+3', '68+2', '55-8', '55-1', '43-2', '77+3', '82+10', '49-10', '72+1', '61+1', '79+8', '70-2', '60-2', '86+7', '81+4', '69+3', '65-10', '64-3', '73+4', '75-6', '64+5', '33+10', '92+2', '76-1', '62+2', '65+7', '58+8', '44-2', '77+1', '82+3', '68-1', '61+3', '73+3', '59+1', '64+12', '85-1', '78+14', '59-1', '75+4', '73+10', '71+6', '77+13', '62+1', '64-5'])
('Composure', ['79+1', '74+2', '70-1', '75+1', '75+2', '66+4', '74+1', '65-8', '68-3', '82+18', '72+6', '67-1', '70+3', '78+10', '72-2', '66+2', '64+4', '74-5', '65+2', '72-1', '68+2', '67+1', '68+3', '65+5', '64-3', '60-3', '69+2', '61+2', '69+1', '58-1', '61+3', '57-3', '59+2', '55-3', '61-1', '68-2', '68-1', '28+4', '63-1', '50+4', '60-1', '62+3', '59-2', '64+24', '62+1', '63+2', '63+3', '58+5', '62-2', '56+2', '52-3', '62-1', '70+5', '55+4', '52+1', '65-1', '64-1', '58-2', '54-4', '58+2', '55+3', '65-2', '56-1', '51+5', '51+3', '51-4', '56+7', '44+1', '56+1', '52-2', '43+1', '42+1', '40+1', '45+2', '44+5', '35+2'])
('Positioning', ['80+1', '80+3', '76-2', '76+1', '80+2', '72+2', '72+4', '73-2', '75+1', '68+2', '58-1', '74+1', '74+3', '56+4', '71-3', '57+2', '70+3', '68-1', '77+1', '72+8', '66+2', '66-2', '62-3', '69+2', '65+2', '63-3', '65+1', '74+2', '70+1', '58+10', '66+1', '68+6', '70-2', '63+4', '66-8', '61+2', '62-2', '64+1', '70+2', '70+9', '69-1', '68+4', '66+5', '67+4', '10-3', '62-1', '67+3', '67+1', '51+2', '55+3', '60+2', '67+2', '56+29', '69+3', '54+1', '51-4', '61+1', '46+4', '52-7', '40+14', '58+2', '56+3', '39+2', '60-2', '50+2', '30-1', '56-4', '53+1', '52+1', '50+7', '51-1', '47+6', '47+1', '56+7', '17+10', '27+3', '42+1', '52-1'])
('Sliding tackle', ['71+4', '82-1', '77-1', '31+8', '73+1', '79-2', '70+3', '78-1', '77+1', '58+37', '77+3', '78+1', '66+2', '72+1', '73+2', '70+1', '34+14', '65-1', '74+2', '73+3', '42+9', '32-4', '70+2', '80+5', '58+2', '66-3', '67-2', '65-2', '16+4', '68-1', '68+1', '69-1', '67-1', '68+3', '70-1', '69+2', '71-3', '18+3', '20+2', '29-9', '66-1', '52+12', '67-3', '48+6', '55+1', '65+1', '62+3', '44+26', '62-2', '63+5', '61+2', '64+1', '53-3', '59+2', '60+1', '61+5', '63+14', '62-4', '64+4', '60-2', '60-4', '65-3', '21+10', '67+2', '63+1', '31+10', '61-2', '23-2', '59+3', '51+3', '68-5', '34-1', '52+2', '66+1', '54+1', '52+1', '57+4', '56+2', '41+1', '54+3', '50-1'])
('Crossing', ['79+2', '85+1', '63+2', '79+4', '70+2', '74-5', '80+2', '68-8', '72+1', '74+2', '61+1', '71+2', '70-3', '65-3', '72+4', '66-3', '72-8', '79+3', '73+3', '56-3', '61-4', '67+1', '66-2', '60+2', '61-3', '67-2', '41+5', '62+1', '65-1', '66+2', '65+2', '64+2', '37-6', '66+1', '57-10', '66+7', '68+4', '36+10', '63+5', '67-1', '50+1', '62-3', '69-1', '60-1', '46+1', '71+1', '64-1', '58+2', '67+9', '53+1', '68+2', '59+2', '65+3', '54+10', '73-1', '64-5', '53+2', '54+2', '50+8', '58+1', '51+8', '60+8', '50+3', '65+6', '51-3', '54+7', '55+3', '61+29', '60+5', '48-7', '59+9', '55-2', '42+3', '58+11', '56-1', '38-9', '52-6', '56+3', '32-1', '45+9', '55+1', '65+1', '63-1', '36+6', '64+37', '48+7', '45+1', '33+3', '32+2'])
('Interceptions', ['37+1', '43+4', '81+2', '76+2', '64+13', '76+1', '69+1', '71+3', '73+1', '75-1', '74+1', '64+1', '72+1', '49+13', '49+17', '48-11', '36-10', '75-2', '57-2', '21-11', '66+1', '69-4', '67-2', '68-3', '67+1', '68+2', '66+2', '68+1', '68-1', '68-4', '67+3', '66-2', '61+1', '74+7', '17+1', '67-4', '62-1', '68+4', '64-1', '60+10', '67-1', '10-11', '45+6', '65+1', '64+12', '69+5', '59+4', '64+2', '36+24', '63+29', '39+10', '49+15', '59+2', '54+2', '64+4', '48+2', '56+4', '62+5', '23+10', '58+4', '58-6', '55+2', '63+2', '55+5', '57+3', '58-3', '18-6', '60+2', '58-4', '63-2', '48+1', '59+1', '57+1', '44+3', '56+1', '48+24', '24-1', '57-1', '50+5', '15-11', '34+5', '27+20', '49+2', '52+2', '31+1'])
('Strength', ['70+1', '79+1', '66+1', '34+3', '65-2', '76+2', '71+3', '68+1', '47+3', '52-5', '80+1', '68+6', '85-1', '75+5', '87-4', '75+2', '83+5', '77-1', '56-5', '85+1', '68+4', '76+1', '78+2', '65-1', '83-1', '54+5', '84+6', '60+2', '74+1', '81-3', '51-7', '68+2', '75+1', '57-3', '78-6', '69-3', '69+1', '65+3', '85-5', '78-2', '73-1', '70-1', '67-5', '74+2', '78+1', '72-2', '83+4', '74-1', '57-10', '84-4', '73+3', '48-2', '80-1', '66+5', '57+1', '40+10', '68-1', '55-6', '69+11', '79-1', '71-1', '54-7', '54+2', '82-5', '50+3', '73-2', '59-19', '57+4', '68+3', '74+4', '33+2', '87-3', '58+10', '70+3', '37+5', '71-4', '32+2', '55-1', '69+8', '74+8', '66+6', '65+2', '70+6', '45-10', '67+7', '47+9', '48+12', '72+3', '70+15'])
('Vision', ['72+2', '79+1', '77+1', '77-4', '73-2', '57-1', '73+1', '78+3', '78+1', '78+2', '66+6', '74+2', '67-3', '74+1', '71-2', '62+8', '67-1', '70+2', '66-2', '38-10', '70-4', '73-3', '68+3', '66+1', '68-3', '62+1', '63-4', '69+10', '36-10', '74-2', '69+1', '72+1', '52+5', '68+1', '61+2', '48+2', '58+5', '67+3', '57-4', '69+2', '54-4', '70-2', '66+10', '72+4', '60-5', '70-1', '52-1', '67+1', '65+2', '42-5', '65+1', '65+5', '63+2', '71+4', '53+1', '64+1', '29+15', '71+1', '71-1', '66+2', '57+8', '62+11', '60-3', '52+4', '60+8', '61+11', '55-2', '56+3', '54+1', '63+7', '66-5', '45+24', '52+1', '49+14', '60+4', '59+5', '52+3', '49+1', '61-1', '44+1', '31+3', '37+1', '58+3', '55+8', '51-2', '44+3', '30+1', '47+1', '50+1'])
('Stamina', ['68+2', '82+1', '85+2', '73+3', '75+1', '77+2', '84+7', '75+19', '58-2', '70+2', '74+2', '66+4', '79+2', '41+20', '72-3', '72+2', '72+1', '78-2', '84-3', '56+3', '86-3', '78-1', '80+2', '60+2', '69-1', '85+1', '62-4', '70-1', '63+2', '64+7', '76+3', '68-2', '66-8', '87+3', '85+3', '62-1', '60-4', '81-4', '78+3', '74+9', '74+1', '65-4', '72-1', '76+6', '67+2', '43-2', '73-2', '67-1', '65-1', '72-10', '65+4', '29-11', '59+3', '79-3', '72-2', '69+3', '65+1', '74+4', '64-3', '79+8', '68-1', '64-2', '64+2', '69+1', '73+10', '58-10', '67-2', '66+9', '80-2', '52+7', '61-22', '66+2', '75+20', '70-3', '51-8', '67+1', '74-1', '90+7', '61+30', '54+12', '60+4', '62+1', '73+22', '44-3', '48-1', '63+4', '40+1', '70+6', '69-7', '78+38', '29-9', '62+3', '58+2', '61+6', '53+2', '60-2'])
('Marking', ['84-1', '65+1', '77-2', '78-2', '73+2', '68+2', '78+1', '64+22', '63+1', '62+4', '25+4', '70-1', '74+1', '70+1', '42+21', '73-1', '72+1', '57+1', '79+4', '72+3', '41+7', '75-3', '70+2', '24-7', '67-4', '57+2', '68-1', '59+2', '71-2', '69-2', '74-4', '70-2', '65+2', '62-1', '69+1', '66+2', '59-3', '65+4', '66-2', '61+2', '68-3', '20+3', '70+4', '24+4', '64-3', '66+1', '34-3', '67-6', '55+16', '65-1', '72-1', '43+3', '60-2', '66+9', '60+1', '63+2', '62+6', '64+1', '68+4', '61+1', '56-5', '52+3', '63+4', '64+6', '31-1', '50-4', '62-3', '54+4', '60+2', '60-1', '57+5', '63-1', '64-4', '59+4', '58-2', '57-6', '61+3', '62-4', '22-2', '14-19', '55-1', '54+3', '59-2', '57+3', '40+6', '56-1', '23-1', '25-1', '56+1', '54+6', '15+10', '46+2', '50+1', '45+1', '47+3'])
('Finishing', ['52+2', '81+1', '79+1', '60+2', '79+3', '70+1', '69+4', '74+1', '72-1', '78+1', '69+1', '74+2', '64-3', '45+1', '73+1', '71+1', '80-3', '74+3', '75+1', '65-1', '71-1', '73-1', '77+1', '77-1', '67+3', '70-1', '73+2', '58-2', '58-1', '65+2', '57-3', '71-2', '47+10', '59+2', '65+3', '69+2', '68+2', '62+6', '52+1', '53+4', '69+5', '65-2', '70+2', '44+4', '66+3', '63-1', '58+3', '59+1', '66+1', '50+2', '36-12', '64-2', '68+4', '45+2', '52-3', '58+7', '53-2', '63-3', '33+3', '65+1', '58-4', '58+6', '55+3', '42+3', '64+2', '62-1', '61+1', '61+2', '40-1', '67+2', '38-1', '69-2', '61+3', '62+2', '57-5', '64+5', '51-1', '37+4', '68-3', '63-2', '32-8', '35+3', '49+2', '38+10', '46+1', '26-1', '53-4', '69+3', '54-1', '57-2', '49+1', '48+3', '13+7', '29+13', '56-1'])
('Sprint speed', ['73+7', '83+1', '53-1', '69+1', '84+1', '69+2', '89+1', '58+3', '80-3', '77-4', '73+1', '90+1', '67+3', '89+3', '70-9', '74+1', '38-3', '70+1', '49+8', '82-3', '71+1', '76+5', '78+2', '64+1', '69+3', '75-3', '85-4', '77+2', '63+4', '68+2', '88-2', '64+3', '55-9', '68-1', '77+4', '71-2', '85+1', '68+1', '72-2', '66+4', '80+2', '80-2', '65-3', '67-2', '80+3', '69+6', '79-2', '78+5', '47-2', '83-2', '57+1', '52-14', '73+6', '37-10', '86-1', '58+4', '54+11', '84+5', '77+3', '66+5', '76+1', '55-10', '69-1', '83+11', '68-4', '77+7', '68+6', '71+5', '70-2', '80-5', '68+7', '85+5', '74+4', '73+2', '71-5', '66-10', '90-2', '60-2', '70-10', '67+12', '55+17', '32-1', '87-2', '72-1', '66+6', '64+2', '81+2', '70+5', '66+8', '46-3', '75+3', '70+15', '71+3', '65+3', '71+39', '70+3', '74+2', '61+1', '68+4', '95+2', '65+2', '75+8', '58-2', '73-3', '72+12', '79+17', '60+1', '63-11'])
('Reactions', ['79+2', '78+2', '73+1', '74-2', '76-1', '74+6', '74+4', '77+2', '76+3', '71-2', '72+1', '79+4', '77+1', '75-2', '71+1', '73+3', '71-1', '69-1', '70-1', '72+4', '73+7', '63-2', '66-1', '76+1', '68+1', '66-3', '67+1', '59+4', '61+2', '68+3', '62-3', '75+1', '66+2', '64+3', '76+2', '56+4', '62-2', '69-2', '61-2', '60-2', '65+1', '69+6', '62+1', '55-1', '66+13', '62+2', '58-2', '61+1', '63+4', '53-1', '65-1', '59-3', '64+5', '59+2', '65-2', '67+2', '58-1', '60-5', '62+6', '68+2', '57-2', '63-9', '62+9', '63-1', '59+1', '52-1', '57-1', '59-1', '59+10', '55+2', '56+1', '61-1', '56-1', '54+6', '56-2', '56+3', '57+3', '51+1', '52+7', '49+1', '51+2', '57+5', '55-3', '43-7', '54-2', '48+1', '45+1', '49+5', '52+1', '39-6', '54+3'])
('Long passing', ['80+1', '70+3', '72+2', '81+1', '70+1', '76+1', '82-1', '65+10', '55+24', '77+1', '73-1', '74+2', '61+1', '75-2', '76-1', '74+6', '62+13', '62+4', '72+4', '55-2', '66+6', '71+3', '52+10', '60-2', '62+2', '56-10', '70-2', '53-3', '65+6', '66+2', '70-1', '66+1', '53+3', '61+2', '59-1', '56-2', '74+4', '65+4', '67+5', '71+1', '74-5', '65-2', '61-1', '67+1', '65+1', '44+5', '54+5', '65+3', '59+8', '47+3', '67-1', '53-2', '57+6', '64-3', '75+1', '57+1', '52+3', '60-1', '69+2', '47+4', '43+3', '67+4', '62+1', '65+2', '55-3', '58+4', '57+2', '65+8', '57+21', '54+10', '44+4', '47-4', '48+8', '64+2', '57-1', '56+2', '46-6', '63-3', '45-10', '55+11', '51-3', '51+2', '38+3', '60-3', '40+12', '48+2', '61+4', '50+6', '38+1', '26+16', '26+4', '58+2', '34-1', '62+20', '49-5', '49-1', '38+6', '36+1', '48+10', '59+6', '34+10', '30+6', '25+4'])
('Standing tackle', ['82-1', '72-5', '83-1', '34+8', '76-2', '80-2', '69+3', '80+2', '64+35', '76+2', '46+10', '72+1', '71+2', '39+5', '77+3', '75-2', '75+1', '73+2', '35+6', '71-1', '74+1', '66-3', '72-1', '76+3', '41+10', '70+3', '73+1', '34-5', '68-1', '67-1', '59+2', '69-2', '40+10', '70-1', '65+3', '71-2', '72-4', '67+2', '66-2', '71+1', '60-3', '66+3', '44+10', '64-1', '72-2', '63+4', '73-2', '60+5', '17+2', '27+4', '65-2', '37-3', '67+1', '67-2', '58+10', '69-3', '70+1', '49+4', '69-1', '64+1', '69+4', '67+4', '66+4', '63+1', '51-9', '65+1', '68+1', '68+2', '62+1', '60+2', '62+2', '48+4', '45+2', '68+15', '65-3', '63-2', '63+2', '57-4', '63-1', '13-2', '61-1', '60-2', '37+10', '20-1', '30-2', '64+2', '59+7', '60+1', '55+2', '33+2', '73+5', '26-1', '52+4', '57+1', '54+7', '60+4', '54+4', '58+2', '47+1'])
('Dribbling', ['78+3', '87+1', '84+1', '68+3', '77+1', '80+1', '76+1', '76-2', '74+2', '71-3', '73+2', '71+2', '77+3', '74+4', '76-1', '75-2', '75+1', '72+4', '73-2', '69-5', '72-1', '72+3', '72+2', '74+1', '70+1', '74-1', '78-1', '67-1', '72+1', '38+2', '66-2', '59-2', '64-1', '71+4', '67+2', '56+1', '73+1', '60-1', '66-1', '68+1', '79+1', '66+2', '67+1', '48+15', '69-1', '71+1', '58+4', '42+2', '72-2', '64+3', '63-2', '65+1', '68+2', '36+2', '66+1', '64+1', '66+4', '65-2', '59+10', '63-1', '60-2', '54+8', '62-2', '66+3', '65-1', '57-1', '54-4', '54+10', '62-1', '65+2', '55-2', '59+5', '60+5', '34+3', '58-10', '64+4', '54+1', '49+1', '64-2', '47-2', '65+3', '63+2', '35+7', '49+3', '52+4', '37-1', '58-1', '59-1', '37+7', '31+2', '47+4', '62+5', '50+6', '53+2', '11+6', '36+4', '55-1', '59+1', '41+1', '53+1', '43-1'])
('Ball control', ['83+2', '77+1', '79+1', '85+1', '80+1', '70+1', '76+1', '82+6', '68+6', '74+1', '66+1', '74+5', '63-3', '75-2', '71-2', '74+3', '72-2', '74+2', '59-3', '70-1', '71-1', '68+1', '75+1', '38+3', '67+2', '48+3', '56-2', '70-3', '72+3', '62+2', '61-2', '67+1', '64+1', '72-1', '72-3', '56+2', '55-2', '72+2', '66-1', '63-1', '71+2', '63+1', '58-2', '65-1', '65+5', '59-2', '62-1', '66+3', '67-1', '68+4', '55-4', '71+1', '63+3', '61-3', '63-2', '65+2', '60-1', '61+2', '66+2', '68-1', '67+3', '72+1', '60+2', '69+1', '65+1', '67+6', '61-1', '56+14', '48+5', '50+4', '60-4', '61+4', '60-2', '58+1', '61+1', '57+1', '43+6', '51+2', '64+3', '55+2', '50+2', '58+4', '39-1', '60+5', '64+4', '64-1', '43+7', '48+1', '56+5', '47+6', '30+19', '41+11', '40+1', '48+4', '14-2', '38+1'])
('Short passing', ['79+2', '73+3', '84+1', '82+1', '78+2', '80+1', '81+3', '67-5', '76+3', '67+1', '78+1', '75+1', '72-2', '77-2', '77-1', '68-2', '65+2', '59-6', '74-2', '71+1', '66+1', '69-3', '79-1', '68+1', '67+6', '58+1', '64-2', '70+1', '74+3', '74+2', '55-10', '70-2', '67+7', '67-1', '72+2', '56-1', '64+4', '73+1', '72+3', '69+4', '65+5', '64-1', '73+2', '63-1', '69+1', '66+4', '67+2', '61+4', '62+1', '70-3', '73+4', '67-2', '66+2', '68-1', '64+3', '65+6', '57+1', '60-3', '67+3', '63+2', '54+5', '75+4', '66-1', '65-1', '59-2', '54-1', '65+1', '59+4', '69-1', '61+6', '62-2', '70+3', '62-5', '64+1', '63+4', '74+5', '67+13', '69+3', '62+4', '62+2', '63-2', '63+3', '60+13', '68+15', '60+6', '66-3', '65+9', '60+4', '68+2', '54+1', '61+3', '45+4', '64+2', '58-2', '54+6', '65-3', '50-10', '59+8', '62+7', '54-2', '60+1', '49+1', '46+10', '54+2', '53+6', '49+4', '18+11', '42+1', '37-1', '58+12', '61+2', '52-6', '62-1', '45+5', '60+3', '58+8', '58+4', '40+10', '44+3', '38+8', '46+1', '33+3', '38+1', '56+1', '31+3', '42-2'])
('Club', ['Real Madrid CF', 'FC Barcelona', 'Paris Saint-Germain', 'FC Bayern Munich', 'Manchester United', 'Chelsea', 'Juventus', 'Manchester City', 'Arsenal', 'Atlético Madrid', 'Borussia Dortmund', 'Milan', 'Tottenham Hotspur', 'Napoli', 'Inter', 'Liverpool', 'Roma', 'Beşiktaş JK', 'AS Monaco', 'Bayer 04 Leverkusen', 'AS Saint-Étienne', 'Athletic Club de Bilbao', '1. FC Köln', 'Villarreal CF', 'FC Schalke 04', 'Olympique de Marseille', 'Atalanta', 'RB Leipzig', 'Real Sociedad', 'Torino', 'Sporting CP', 'Leicester City', 'Southampton', 'FC Porto', 'UD Las Palmas', 'Olympique Lyonnais', 'Lazio', 'Genoa', 'Everton', 'RC Celta de Vigo', 'Valencia CF', nan, 'Sevilla FC', 'Toronto FC', 'Borussia Mönchengladbach', 'SL Benfica', 'RCD Espanyol', 'OGC Nice', 'Spartak Moscow', 'Swansea City', 'Sassuolo', 'TSG 1899 Hoffenheim', 'Stoke City', 'Shakhtar Donetsk', 'West Ham United', 'SV Werder Bremen', 'Watford', 'Galatasaray SK', 'Lokomotiv Moscow', 'Zenit St. Petersburg', 'Bournemouth', 'Sampdoria', 'Antalyaspor', 'Girondins de Bordeaux', 'VfL Wolfsburg', 'New York City Football Club', 'Hertha BSC Berlin', 'SD Eibar', 'Ajax', 'RC Deportivo de La Coruña', 'Crystal Palace', 'West Bromwich Albion', 'CSKA Moscow', 'Eintracht Frankfurt', 'Real Betis Balompié', 'Fenerbahçe SK', 'Fiorentina', 'Burnley', 'Tigres U.A.N.L.', 'San Lorenzo de Almagro', 'Chicago Fire Soccer Club', 'Feyenoord', 'FC Krasnodar', 'Angers SCO', 'U.N.A.M.', 'Montreal Impact', 'Chievo Verona', 'LA Galaxy', 'Vissel Kobe', 'Bologna', 'LOSC Lille', 'Orlando City Soccer Club', 'Atlanta United FC', 'Independiente', 'Club Atlético Lanús', 'RSC Anderlecht', 'İstanbul Başakşehir FK', 'Hannover 96', 'Newcastle United', 'Málaga CF', 'Trabzonspor', 'PSV', 'FC Augsburg', 'Club Tijuana', 'VfB Stuttgart', 'Hamburger SV', 'CD Leganés', 'Getafe CF', 'Deportivo Alavés', 'Portland Timbers', 'Kayserispor', 'Udinese', 'Standard de Liège', 'Alanyaspor', '1. FSV Mainz 05', 'Sparta Praha', 'FC Nantes', 'Al Ahli', 'SC Braga', 'Brighton & Hove Albion', 'Levante UD', 'Boca Juniors', 'Columbus Crew SC', 'Querétaro', 'Dijon FCO', 'Olympiakos CFP', 'KRC Genk', 'FC Basel', 'Club América', 'Montpellier Hérault SC', 'Monterrey', 'Seattle Sounders FC', 'River Plate', 'CPD Junior Barranquilla', 'Racing Club de Avellaneda', 'En Avant de Guingamp', 'Rangers', 'Colorado Rapids', 'Necaxa', 'Aston Villa', 'KV Oostende', 'Akhisar Belediyespor', 'Rubin Kazan', 'Rosario Central', 'Ferrara (SPAL)', 'Wolverhampton Wanderers', 'Stade Rennais FC', 'Pachuca', 'Granada CF', 'Colo-Colo', 'KAA Gent', 'Middlesbrough', 'Toulouse FC', 'BSC Young Boys', 'FC St. Gallen', 'Santos Futebol Clube', 'Cagliari', 'Rio Ave FC', 'Norwich City', 'Grêmio Foot-Ball Porto Alegrense', 'Estudiantes de La Plata', 'Cruzeiro', 'Universidad de Chile', 'Atletico Nacional Medellin', 'FC Ufa', 'Girona CF', 'SM Caen', 'Independiente Medellín', 'Djurgårdens IF', 'Vitória Guimarães', 'São Paulo Futebol Clube', 'Royal Antwerp FC', 'Kaizer Chiefs', 'Fulham', 'Sunderland', 'Huddersfield Town', 'PAOK Thessaloniki', 'Fluminense Football Club', 'Leeds United', 'Derby County', 'AC Ajaccio', 'Club León', 'Sociedade Esportiva Palmeiras', 'Hellas Verona', 'Rayo Vallecano', 'Club Brugge KV', 'Real Sporting de Gijón', 'Celtic', 'Hull City', 'Universidad Católica', "CD O'Higgins", 'Club Atlas', 'SC Freiburg', 'Melbourne City', 'Grupo Desportivo de Chaves', 'Al Hilal', 'CA Osasuna', 'Perth Glory', 'Panathinaikos FC', 'FC Utrecht', 'Club Atlético Huracán', 'Birmingham City', 'Botafogo de Futebol e Regatas', 'Deportivo Toluca', 'Amiens SC Football', 'Clube Atlético Paranaense', 'Göztepe', 'FC Groningen', 'FC Ingolstadt 04', 'New England Revolution', 'FC Red Bull Salzburg', 'Clube Atlético Mineiro', 'Crotone', 'Banfield', 'Grasshopper Club Zürich', 'Vitesse', 'AZ Alkmaar', 'SV Darmstadt 98', 'Avaí Futebol Clube', 'Coritiba Foot Ball Club', 'Legia Warszawa', 'RC Strasbourg', 'Brescia', 'Vancouver Whitecaps FC', 'Western Sydney Wanderers', 'Guadalajara', 'Cerezo Osaka', 'CS Marítimo', 'Osmanlıspor', 'Santos Laguna', 'Dinamo Moscow', 'Monarcas Morelia', 'Frosinone', 'Cruz Azul', 'Real Zaragoza', 'Deportes Iquique', 'Sydney FC', 'Brisbane Roar', 'AEK Athens', 'Colon de Santa Fe', 'Atiker Konyaspor', 'Unión Española', 'Reading', 'Asociacion Deportivo Cali', 'Palermo', 'FC Metz', 'Sporting Kansas City', 'FC Seoul', 'Bursaspor', 'Benevento Calcio', 'FC Rostov', 'Associação Atlética Ponte Preta', 'Al Nassr', 'San Jose Earthquakes', 'Nîmes Olympique', 'Al Ittihad', 'Godoy Cruz', 'Gimnàstic de Tarragona', 'Philadelphia Union', 'Sport Club do Recife', 'CD Universidad de Concepción', 'FC Paços de Ferreira', 'New York Red Bulls', 'Club Atlético Tigre', 'FC Lorient', 'Belgrano de Córdoba', 'Sheffield Wednesday', 'CD Tenerife', 'SC Heerenveen', 'Independiente Santa Fe', 'Puebla', 'Vitória Setúbal', 'FC København', 'Kalmar FF', 'CD Aves', "Newell's Old Boys", 'Terek Grozny', '1. FC Union Berlin', 'Real Valladolid', 'ADO Den Haag', 'Santiago Wanderers', 'Bari', 'Ulsan Hyundai Horang-i', '1. FC Heidenheim', 'Argentinos Juniors', 'Brentford', 'Gençlerbirliği SK', 'FC St. Pauli', 'Melbourne Victory', 'VfL Bochum', 'Córdoba CF', 'Atlético Clube Goianiense', 'Kardemir Karabükspor', 'AJ Auxerre', 'Real Oviedo', 'Vitória ', 'Arsenal Tula', 'RC Lens', 'CF Os Belenenses', 'SK Rapid Wien', 'Everton de Viña del Mar', 'CD Los Millionarios Bogota', 'GwangJu FC', 'Sangju Sangmu FC', 'Jeonbuk Hyundai Motors', 'FC Dallas', 'ES Troyes AC', 'F.B.C. Unione Venezia', 'Al Qadisiyah', 'Fortuna Düsseldorf', 'Boavista FC', 'D.C. United', 'Medicana Sivasspor', 'CD Once Caldas Manizales', 'FC Ural', 'Kashiwa Reysol', 'Malmö FF', 'Cádiz C.F.', 'Portimonense SC', 'Urawa Red Diamonds', 'UD Almería', 'AIK Solna', 'CD Huachipato', 'Estoril Praia', '1. FC Nürnberg', 'Pescara', 'Stade de Reims', 'Moreirense FC', 'Empoli', 'Orlando Pirates', 'FC Anzhi Makhachkala', 'Heracles Almelo', 'Eintracht Braunschweig', 'Cremonese', 'CD Antofagasta', 'AS Nancy Lorraine', 'Lobos de la BUAP', 'Suwon Samsung Bluewings', 'Defensa y Justicia', 'Ipswich Town', 'Rosenborg BK', 'KAS Eupen', 'FC Sion', 'Adelaide United', 'Real Salt Lake', 'FC Midtjylland', 'Minnesota Thunder', 'AD Alcorcón', 'Al Faisaly', 'FC Tosno', 'FC Twente', 'F.C. Tokyo', 'IF Elfsborg', 'Carpi', 'Parma', 'IFK Norrköping', 'Unión de Santa Fe', 'Corporación Club Deportivo Tuluá', 'Kawasaki Frontale', 'Associação Chapecoense de Futebol', '1. FC Kaiserslautern', 'Preston North End', 'Molde FK', 'Bristol City', 'Brøndby IF', 'Albacete Balompié', 'GFC Ajaccio', 'KV Kortrijk', 'Kashima Antlers', 'CD Feirense', 'Oxford United', 'Talleres de Cordoba', 'FC Lugano', 'SV Zulte-Waregem', 'Deportes Tolima', 'Amkar Perm', 'Kasimpaşa SK', 'Gimnasia y Esgrima La Plata', 'Queens Park Rangers', 'Shimizu S-Pulse', 'Aalborg BK', 'Águilas Doradas', 'FC Lausanne-Sports', 'Tiburones Rojos de Veracruz', 'Perugia', 'La Spezia', 'Pohang Steelers', 'FC Barcelona B', 'Evkur Yeni Malatyaspor', 'Willem II', 'Novara', 'Sporting Charleroi', 'FK Austria Wien', 'Gamba Osaka', 'DSC Arminia Bielefeld', 'Nottingham Forest', 'Salernitana', 'Vercelli', 'Śląsk Wrocław', 'Sagan Tosu', 'CD Numancia', 'Strømsgodset IF', 'Yokohama F. Marinos', 'Cardiff City', 'FC Sochaux-Montbéliard', 'Jeju United FC', 'Lechia Gdańsk', 'Piast Gliwice', 'SD Huesca', 'FC Zürich', 'Heart of Midlothian', 'Atlético Tucumán', 'BK Häcken', 'Audax Italiano', 'Club de Deportes Temuco', 'San Luis de Quillota', 'Al Taawoun', 'Lech Poznań', 'FC Luzern', 'Cesena', 'Chacarita Juniors', 'Aberdeen', 'Excelsior', 'SpVgg Greuther Fürth', 'San Martín de San Juan', 'Wisła Kraków', 'Hammarby IF', 'Virtus Entella', 'Vegalta Sendai', 'Alianza Petrolera', 'CD Palestino', 'KV Mechelen', 'Central Coast Mariners', 'La Equidad', 'Avellino', 'Omiya Ardija', 'Cracovia', 'Curicó Unido', 'CF Reus Deportiu', 'SK Brann', 'C.D. Leonesa S.A.D.', 'US Quevilly-Rouen', 'Sporting Lokeren', 'Houston Dynamo', 'Termalica Bruk-Bet Nieciecza', 'Tondela', 'SV Sandhausen', 'Al Fayha', 'Jeonnam Dragons', 'Sandefjord Fotball', 'Pogoń Szczecin', 'Östersunds FK', 'Burton Albion', 'Al Raed', 'Jagiellonia Białystok', 'Bahía Blanca', 'Ventforet Kofu', 'Stade Brestois 29', 'CD Lugo', 'Korona Kielce', 'PEC Zwolle', 'MSV Duisburg', 'AFC Eskilstuna', 'Waasland-Beveren', 'Club Atlético Patronato', 'La Berrichonne de Châteauroux', 'SK Sturm Graz', 'Royal Excel Mouscron', 'Júbilo Iwata', 'SønderjyskE', 'Al Shabab', 'US Orléans Loiret Football', 'Bourg en Bresse Péronnas 01', 'FC Nordsjælland', 'FC Thun', 'CD America de Cali', 'VVV-Venlo', 'Charlton Athletic', 'Sevilla Atlético', 'Ternana', 'Gangwon FC', 'IFK Göteborg', 'Odense Boldklub', 'Arsenal de Sarandí', 'SG Dynamo Dresden', 'Bolton Wanderers', 'Plymouth Argyle', 'Southend United', 'Vélez Sarsfield', 'Daegu FC', 'Sanfrecce Hiroshima', 'Odds BK', 'Aarhus GF', 'Blackburn Rovers', 'Lyngby BK', 'FC Erzgebirge Aue', 'Sint-Truidense VV', 'Karlsruher SC', 'Jaguares Fútbol Club', 'Sarpsborg 08 FF', 'Temperley', ' SSV Jahn Regensburg', 'Patriotas Boyacá FC', 'Barnsley', 'Sandecja Nowy Sącz', 'Górnik Zabrze', 'VfL Osnabrück', 'Valenciennes FC', 'Tours FC', 'Lorca Deportiva CF', 'Albirex Niigata', 'Cittadella', 'Aalesunds FK', 'Deportivo Pasto', 'Roda JC Kerkrade', 'Motherwell', 'Holstein Kiel', 'Scunthorpe United', 'Atlético Huila', 'Wisła Płock', 'Atlético Bucaramanga', 'Sparta Rotterdam', 'Stabæk Fotball', 'Würzburger FV', 'VfR Aalen', 'Clermont Foot 63', 'Randers FC', 'Vålerenga Fotball', 'Incheon United FC', 'Hibernian', 'SCR Altach', 'Sheffield United', 'Le Havre AC', 'Walsall', 'Chamois Niortais FC', 'Sportfreunde Lotte', 'Millwall', 'SpVgg Unterhaching', 'FC SKA-Energiya Khabarovsk', 'Hansa Rostock', 'Wellington Phoenix', 'Hobro IK', 'Wigan Athletic', 'Peterborough United', 'Chemnitzer FC', 'Grimsby Town', 'Ascoli', 'Al Fateh', 'Foggia', 'Bristol Rovers', '1. FC Magdeburg', 'SG Sonnenhof Großaspach', 'Bury', 'Lillestrøm SK', 'Arka Gdynia', 'Ross County FC', 'Paris FC', 'Bradford City', 'Hamilton Academical FC', 'Luton Town', 'NAC Breda', 'Portsmouth', 'Zagłębie Lubin', 'Kilmarnock', 'Hokkaido Consadole Sapporo', 'Örebro SK', 'SV Wehen Wiesbaden', 'Coventry City', 'Milton Keynes Dons', 'Rochdale', 'FK Haugesund', 'Dundee FC', 'Oldham Athletic', 'Ohod Club', 'Rotherham United', 'HJK Helsinki', 'LASK Linz', 'AC Horsens', 'SC Fortuna Köln', 'St. Johnstone FC', 'Sogndal', 'Envigado FC', 'Blackpool', 'SC Preußen Münster', 'Hallescher FC', 'Ettifaq FC', 'Crawley Town', 'Exeter City', 'AFC Wimbledon', 'Doncaster Rovers', 'IK Sirius', 'Jönköpings Södra IF', 'Rot-Weiß Erfurt', 'Notts County', 'SV Mattersburg', 'Viking FK', 'Cambridge United', 'Tigres FC', 'Mansfield Town', 'SV Meppen', 'Tromsø IL', 'Swindon Town', 'Werder Bremen II', 'FC Admira Wacker Mödling', 'SC Paderborn 07', 'Partick Thistle F.C.', 'Silkeborg IF', 'Al Batin', 'Colchester United', 'FSV Zwickau', 'Crewe Alexandra', 'Northampton Town', 'Kristiansund BK', 'Lincoln City', 'Dundalk', 'Wycombe Wanderers', 'Yeovil Town', 'GIF Sundsvall', 'FC Carl Zeiss Jena', "St. Patrick's Athletic", 'Fleetwood Town', 'Wolfsberger AC', 'Cork City', 'Gillingham', 'SKN St. Pölten', 'Carlisle United', 'Chesterfield', 'Newcastle Jets', 'Morecambe', 'Port Vale', 'Newport County', 'Shrewsbury', 'Accrington Stanley', 'Forest Green', 'Bohemian FC', 'Cheltenham Town', 'Barnet', 'Halmstads BK', 'Shamrock Rovers', 'Derry City', 'Stevenage', 'Finn Harps', 'FC Helsingør', 'Sligo Rovers', 'Bray Wanderers', 'Limerick FC', 'Galway United', 'Drogheda United'])

Ok vores mærkelig værdier skyldes faktisk, at man i Fifa har tillægs-værdier på nogle af attributterne. For nemhedens skylds fjerner vi blot disse ekstra elementer. En anden fremgangsmetode kunne være at fjerne rækken med det "dårlige" data i. Dette gælder dog kun for de fysiske attributer.


In [10]:
def convert_columns(data_frame, mixed_cols):
    for col in mixed_cols:
        try:
            data_frame[col] = data_frame[col].str[:2].astype(np.float64)
        except Exception as e:
            print(e)
    return data_frame

In [11]:
df = convert_columns(df,mixed_type_cols[:-1])
df.dtypes


/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
Out[11]:
Name                    object
Age                      int64
Nationality             object
Overall                  int64
Potential                int64
Club                    object
Value                   object
Wage                    object
Special                  int64
Acceleration           float64
Aggression             float64
Agility                float64
Balance                float64
Ball control           float64
Composure              float64
Crossing               float64
Curve                  float64
Dribbling              float64
Finishing              float64
Free kick accuracy     float64
GK diving              float64
GK handling            float64
GK kicking             float64
GK positioning         float64
GK reflexes            float64
Heading accuracy       float64
Interceptions          float64
Jumping                float64
Long passing           float64
Long shots             float64
                        ...   
Vision                 float64
Volleys                float64
CAM                    float64
CB                     float64
CDM                    float64
CF                     float64
CM                     float64
ID                       int64
LAM                    float64
LB                     float64
LCB                    float64
LCM                    float64
LDM                    float64
LF                     float64
LM                     float64
LS                     float64
LW                     float64
LWB                    float64
Preferred Positions     object
RAM                    float64
RB                     float64
RCB                    float64
RCM                    float64
RDM                    float64
RF                     float64
RM                     float64
RS                     float64
RW                     float64
RWB                    float64
ST                     float64
dtype: object

Fantastisk! Nu har du fået converteret de fysiske attributter. Du bemærket tidligere, at klubberne også var markeret til at have blandet typer. Dette kan undersøges ret let:


In [12]:
df['Club'].sort_values(ascending=False, na_position='first')


Out[12]:
162                       NaN
167                       NaN
274                       NaN
472                       NaN
488                       NaN
658                       NaN
925                       NaN
944                       NaN
1019                      NaN
1273                      NaN
1351                      NaN
1402                      NaN
1549                      NaN
1597                      NaN
1802                      NaN
1819                      NaN
1826                      NaN
1888                      NaN
1899                      NaN
2000                      NaN
2090                      NaN
2241                      NaN
2246                      NaN
2281                      NaN
2350                      NaN
2424                      NaN
2483                      NaN
2518                      NaN
2552                      NaN
2553                      NaN
                 ...         
13392        1. FC Heidenheim
7396         1. FC Heidenheim
12298     SSV Jahn Regensburg
12213     SSV Jahn Regensburg
13582     SSV Jahn Regensburg
17513     SSV Jahn Regensburg
16587     SSV Jahn Regensburg
7268      SSV Jahn Regensburg
9125      SSV Jahn Regensburg
13873     SSV Jahn Regensburg
13763     SSV Jahn Regensburg
6972      SSV Jahn Regensburg
8931      SSV Jahn Regensburg
9135      SSV Jahn Regensburg
9358      SSV Jahn Regensburg
7160      SSV Jahn Regensburg
8761      SSV Jahn Regensburg
9955      SSV Jahn Regensburg
9969      SSV Jahn Regensburg
14786     SSV Jahn Regensburg
10072     SSV Jahn Regensburg
5056      SSV Jahn Regensburg
16548     SSV Jahn Regensburg
15198     SSV Jahn Regensburg
10699     SSV Jahn Regensburg
6266      SSV Jahn Regensburg
12540     SSV Jahn Regensburg
8562      SSV Jahn Regensburg
15226     SSV Jahn Regensburg
10154     SSV Jahn Regensburg
Name: Club, dtype: object

Din mavefornemmelse virkede! Der er nogle af spillerne som ikke har klubber, dvs. de er arbejdsløse. Dette er ikke nogen katastrofe. Vi kan enten fjerne dem (de må simpelthen være så dårlige at ingen gider ansætte dem), eller også kan vi lade dem være i (de fortjener en chance). I dette forsøg vælges det sidste udvalg.

Du bemærket også at Wage (Løn) og Value (Værdi) er object typer. Hvis vi undersøger disse kolonner nærmere ser vi det skyldes dels, at vi regner i € (Euro) og at dataindsamlerne har været så venlig - at erstatte antallet af 0'er med hhv. K for 000 og M 000000.


In [13]:
df[['Wage','Value']]


Out[13]:
Wage Value
0 €565K €95.5M
1 €565K €105M
2 €280K €123M
3 €510K €97M
4 €230K €61M
5 €355K €92M
6 €215K €64.5M
7 €295K €90.5M
8 €340K €79M
9 €275K €77M
10 €310K €52M
11 €285K €83M
12 €190K €59M
13 €265K €67.5M
14 €340K €57M
15 €370K €69.5M
16 €325K €66.5M
17 €225K €38M
18 €110K €4.5M
19 €215K €79M
20 €82K €57M
21 €150K €75M
22 €225K €70.5M
23 €165K €61M
24 €210K €44M
25 €215K €48M
26 €125K €40M
27 €215K €48M
28 €265K €60M
29 €165K €38M
... ... ...
17951 €1K €40K
17952 €1K €60K
17953 €1K €60K
17954 €1K €60K
17955 €1K €60K
17956 €1K €50K
17957 €1K €50K
17958 €1K €40K
17959 €1K €60K
17960 €1K €70K
17961 €5K €60K
17962 €1K €50K
17963 €1K €50K
17964 €1K €60K
17965 €1K €60K
17966 €1K €60K
17967 €1K €60K
17968 €1K €60K
17969 €1K €60K
17970 €1K €40K
17971 €2K €60K
17972 €1K €60K
17973 €1K €50K
17974 €1K €40K
17975 €1K €30K
17976 €1K €50K
17977 €1K €0
17978 €1K €60K
17979 €1K €60K
17980 €1K €50K

17981 rows × 2 columns


In [14]:
def parse_of_wage(val):
    val = re.sub('€', '', val)
    valdict = {'M': 1000000, 'K': 1000, '': 0}
    try:
        splitter = re.findall('(\d*\.?\d)([MK]?)', val)[0]
        return float(splitter[0])*valdict[splitter[1]]
    except IndexError as e:
        print(splitter)

In [15]:
df['Value'] = df['Value'].apply(lambda x: parse_of_wage(x))
df['Wage'] = df['Wage'].apply(lambda x: parse_of_wage(x))
df = df.fillna(value=0.0)


/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  

In [16]:
g = df.columns.to_series().groupby(df.dtypes).groups
d = {key.name: list(val) for key, val in g.items()}
for key, val in df.dtypes.items():
    print('Variablenavn: {:20s} Variabletype: {}'.format(key,val))


Variablenavn: Name                 Variabletype: object
Variablenavn: Age                  Variabletype: int64
Variablenavn: Nationality          Variabletype: object
Variablenavn: Overall              Variabletype: int64
Variablenavn: Potential            Variabletype: int64
Variablenavn: Club                 Variabletype: object
Variablenavn: Value                Variabletype: float64
Variablenavn: Wage                 Variabletype: float64
Variablenavn: Special              Variabletype: int64
Variablenavn: Acceleration         Variabletype: float64
Variablenavn: Aggression           Variabletype: float64
Variablenavn: Agility              Variabletype: float64
Variablenavn: Balance              Variabletype: float64
Variablenavn: Ball control         Variabletype: float64
Variablenavn: Composure            Variabletype: float64
Variablenavn: Crossing             Variabletype: float64
Variablenavn: Curve                Variabletype: float64
Variablenavn: Dribbling            Variabletype: float64
Variablenavn: Finishing            Variabletype: float64
Variablenavn: Free kick accuracy   Variabletype: float64
Variablenavn: GK diving            Variabletype: float64
Variablenavn: GK handling          Variabletype: float64
Variablenavn: GK kicking           Variabletype: float64
Variablenavn: GK positioning       Variabletype: float64
Variablenavn: GK reflexes          Variabletype: float64
Variablenavn: Heading accuracy     Variabletype: float64
Variablenavn: Interceptions        Variabletype: float64
Variablenavn: Jumping              Variabletype: float64
Variablenavn: Long passing         Variabletype: float64
Variablenavn: Long shots           Variabletype: float64
Variablenavn: Marking              Variabletype: float64
Variablenavn: Penalties            Variabletype: float64
Variablenavn: Positioning          Variabletype: float64
Variablenavn: Reactions            Variabletype: float64
Variablenavn: Short passing        Variabletype: float64
Variablenavn: Shot power           Variabletype: float64
Variablenavn: Sliding tackle       Variabletype: float64
Variablenavn: Sprint speed         Variabletype: float64
Variablenavn: Stamina              Variabletype: float64
Variablenavn: Standing tackle      Variabletype: float64
Variablenavn: Strength             Variabletype: float64
Variablenavn: Vision               Variabletype: float64
Variablenavn: Volleys              Variabletype: float64
Variablenavn: CAM                  Variabletype: float64
Variablenavn: CB                   Variabletype: float64
Variablenavn: CDM                  Variabletype: float64
Variablenavn: CF                   Variabletype: float64
Variablenavn: CM                   Variabletype: float64
Variablenavn: ID                   Variabletype: int64
Variablenavn: LAM                  Variabletype: float64
Variablenavn: LB                   Variabletype: float64
Variablenavn: LCB                  Variabletype: float64
Variablenavn: LCM                  Variabletype: float64
Variablenavn: LDM                  Variabletype: float64
Variablenavn: LF                   Variabletype: float64
Variablenavn: LM                   Variabletype: float64
Variablenavn: LS                   Variabletype: float64
Variablenavn: LW                   Variabletype: float64
Variablenavn: LWB                  Variabletype: float64
Variablenavn: Preferred Positions  Variabletype: object
Variablenavn: RAM                  Variabletype: float64
Variablenavn: RB                   Variabletype: float64
Variablenavn: RCB                  Variabletype: float64
Variablenavn: RCM                  Variabletype: float64
Variablenavn: RDM                  Variabletype: float64
Variablenavn: RF                   Variabletype: float64
Variablenavn: RM                   Variabletype: float64
Variablenavn: RS                   Variabletype: float64
Variablenavn: RW                   Variabletype: float64
Variablenavn: RWB                  Variabletype: float64
Variablenavn: ST                   Variabletype: float64

Fedt! nu kan du konstatere at de eneste object-type variable er: {{', '.join(d['object'][:-1])}} og {{d['object'][-1]}}

Indførsel af labels i datasættet

For at vi kan kende forskel på hvilke spillere, der kan være kandidater, er vi nødt til at tildele alle spillere en label. Da vores klientel er topklubber i Europa vælger vi at tage udgangspunkt i de klubbers spillere, som skal udgøre lablen: 1. For at vores model kan kende forskel på disse spillere og andre spillere, er modellen nødt til at have spillere, som ikke har noget med disse klubber at gøre. Disse spillere vil få label-værdien: 0.

Dine kunder er: Barcelona, Real Madrid, Juventus, AC Milan, Bayern München, Arsenal og Manchester City.

Opgave 1: Vælg topklubber

Som nørdet data scientist, vil du dog sikre dig at du får det bedste resultat, og derfor tænker du at vi er nødt til at finde en mere generel population af klubber, som kan klassificeres som topklub. Vi starter med at kigge på hvordan de enkelte klubbers overordnet præstation, givet ved Overall forholder sig.

Din opgave er

  • Leg lidt med nedenstående graf og vælg et passende antal bins (spande på dansk), så vi får et retvisende billedet af hvilke klubber, der er toppen af poppen i Europa.
  • På baggrund af din analyse skal du vælge den værdi, som skiller topklubberne fra alle de andre.

In [17]:
import matplotlib.pyplot as plt
import seaborn as sb

antal_bins = 2###JEG SKAL ÆNDRES TIL NOGET FORNUFTIGT! ###

overall_performance = (df
                       .groupby('Club',as_index=False)['Overall'].mean()
                       .sort_values(by='Overall',ascending = False))

ax = sb.distplot(overall_performance['Overall'], bins= antal_bins ,kde=False)
ax.set_xlabel('Gennemsnitlig overall performance')
ax.set_ylabel('Antal klubber')
plt.show()



In [26]:
top_klub_ratio = None### Fjern NONE og UDFYLD MIG ###

top_clubs = overall_performance[overall_performance['Overall'] >= top_klub_ratio]['Club']

Sådan! Det eneste vi mangler nu, er at lave de egentlige datasæt vi skal bruge til træningen af vores Machine Learning algoritme.

Vi laver 3 sæt hhv.:

  • Danskersættet. Et datasæt med alle danske spillere. Som vi skal bruge til sidst, for at lave vores anbefalinger til kunden.
  • topklub_set. Et datasæt kun med topklubberne
  • overall_set. Et datasæt med de udvalgte topklubber og et udvalg af ikke topklubbet sat sammen til ét. Dette er vores træningssæt.

In [25]:
from sklearn.model_selection import train_test_split

dansker_set = df[df['Nationality'] == 'Denmark']
topklub_set = df[df['Club'].isin(top_clubs)]
ikke_topklub_set = df[(~df['Club'].isin(top_clubs)) & (df['Nationality'] != 'Denmark')].sample(len(topklub_set))
overall_set = pd.concat([topklub_set, ikke_topklub_set])

print('Træningsæt størrelse: {}'.format(len(overall_set)))


Træningsæt størrelse: 1272

Du er nået til vejs ende. Næste sæt øvelser er om Supervised learning.


In [ ]: