Shelter Animal Outcomes 1

Data visualization


In [1]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_csv('train.csv')

In [3]:
df.head()


Out[3]:
AnimalID Name DateTime OutcomeType OutcomeSubtype AnimalType SexuponOutcome AgeuponOutcome Breed Color
0 A671945 Hambone 2014-02-12 18:22:00 Return_to_owner NaN Dog Neutered Male 1 year Shetland Sheepdog Mix Brown/White
1 A656520 Emily 2013-10-13 12:44:00 Euthanasia Suffering Cat Spayed Female 1 year Domestic Shorthair Mix Cream Tabby
2 A686464 Pearce 2015-01-31 12:28:00 Adoption Foster Dog Neutered Male 2 years Pit Bull Mix Blue/White
3 A683430 NaN 2014-07-11 19:09:00 Transfer Partner Cat Intact Male 3 weeks Domestic Shorthair Mix Blue Cream
4 A667013 NaN 2013-11-15 12:52:00 Transfer Partner Dog Neutered Male 2 years Lhasa Apso/Miniature Poodle Tan

In [4]:
df['AnimalType'].unique()


Out[4]:
array(['Dog', 'Cat'], dtype=object)

In [5]:
df.groupby(['AnimalType']).get_group('Cat').shape[0]


Out[5]:
11134

In [6]:
df.groupby(['AnimalType']).get_group('Dog').shape[0]


Out[6]:
15595

In [7]:
df['OutcomeType'].unique()


Out[7]:
array(['Return_to_owner', 'Euthanasia', 'Adoption', 'Transfer', 'Died'], dtype=object)

In [8]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))
sns.countplot(x="OutcomeType", data=df,  ax=ax1)
sns.countplot(x="AnimalType", hue="OutcomeType", data=df,  ax=ax2)


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ad5cbdc50>

Overall it seems not many animals died of natural causes.

Doesn't seem like cats have nine lives unfortunately. Probably because of their shitty attitude and general evilness they are likely to get transferred. Dogs have tricked their masters with their sad puppy face to get returned more. Also they are told to be more loyal.


In [9]:
sns.countplot(x="SexuponOutcome", hue="OutcomeType", data=df)


Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ad5f73ad0>

Overall sex likely does not play a big role in outcome, but spayed/neutered population is bigger they are more likely to get adopted


In [10]:
dfCat = df.groupby(['AnimalType']).get_group('Cat')
dfDog = df.groupby(['AnimalType']).get_group('Dog')

In [11]:
f, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 4))
sns.countplot(x="SexuponOutcome", hue="OutcomeType", data=dfCat, ax=ax1)
sns.countplot(x="SexuponOutcome", hue="OutcomeType", data=dfDog, ax=ax2)


Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ad5924290>

Cats and dogs have different probability distributions for outcome


In [12]:
dfCat['Color'].describe()


Out[12]:
count           11134
unique            146
top       Brown Tabby
freq             1635
Name: Color, dtype: object

In [13]:
dfDog['Color'].describe()


Out[13]:
count           15595
unique            262
top       Black/White
freq             1730
Name: Color, dtype: object

As expected there are too many colors that makes it difficult to properly visualize without discarding a majority of colors. Thinking a bit, it makes more sense to have a combination of both color and breed to make a pet to be more appealing/attractive.


In [14]:
df['AgeuponOutcome'].unique()


Out[14]:
array(['1 year', '2 years', '3 weeks', '1 month', '5 months', '4 years',
       '3 months', '2 weeks', '2 months', '10 months', '6 months',
       '5 years', '7 years', '3 years', '4 months', '12 years', '9 years',
       '6 years', '1 weeks', '11 years', '4 weeks', '7 months', '8 years',
       '11 months', '4 days', '9 months', '8 months', '15 years',
       '10 years', '1 week', '0 years', '14 years', '3 days', '6 days',
       '5 days', '5 weeks', '2 days', '16 years', '1 day', '13 years', nan,
       '17 years', '18 years', '19 years', '20 years'], dtype=object)

As expected there are animals over a wide spectrum of ages. Age should play a major role deciding the outcome.


In [15]:
df['NameIsPresent'] = df['Name'].isnull()

In [16]:
sns.countplot(x="NameIsPresent", hue="OutcomeType", data=df)


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ad5857a90>

Animals that didn't have names or their names were lost, as is evident from the graph above, that their outcome probability distribution would be very different. Named animals seem to be more popular for adoption. Named animals could mean that they had previous owners and possible stories.


In [17]:
df[df['NameIsPresent'] == True].shape[0]


Out[17]:
7691

In [18]:
df[df['NameIsPresent'] == False].shape[0]


Out[18]:
19038

We can see that out of the animals present in training set more than 2/3 had names and roughly about half of them got adopted.


In [19]:
df['OutcomeSubtype'].unique()


Out[19]:
array([nan, 'Suffering', 'Foster', 'Partner', 'Offsite', 'SCRP',
       'Aggressive', 'Behavior', 'Rabies Risk', 'Medical', 'In Kennel',
       'In Foster', 'Barn', 'Court/Investigation', 'Enroute', 'At Vet',
       'In Surgery'], dtype=object)

In [20]:
sns.set_context("poster")
sns.countplot(x="OutcomeSubtype", hue="AnimalType", data=df)


Out[20]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9ad568c050>

In [25]:
df['DateTime']


Out[25]:
0        2014-02-12 18:22:00
1        2013-10-13 12:44:00
2        2015-01-31 12:28:00
3        2014-07-11 19:09:00
4        2013-11-15 12:52:00
5        2014-04-25 13:04:00
6        2015-03-28 13:11:00
7        2015-04-30 17:02:00
8        2014-02-04 17:17:00
9        2014-05-03 07:48:00
10       2013-12-05 15:50:00
11       2013-11-04 14:48:00
12       2016-02-03 11:27:00
13       2015-06-08 16:30:00
14       2015-11-25 15:00:00
15       2014-07-12 12:10:00
16       2014-05-03 16:15:00
17       2014-06-07 12:54:00
18       2014-05-17 11:32:00
19       2014-07-30 17:34:00
20       2014-01-19 15:03:00
21       2015-09-18 15:19:00
22       2015-08-15 14:22:00
23       2013-10-28 16:32:00
24       2014-04-09 17:44:00
25       2015-10-03 15:44:00
26       2016-01-15 17:31:00
27       2015-03-25 18:50:00
28       2015-11-21 13:01:00
29       2015-07-30 14:30:00
                ...         
26699    2014-04-21 14:01:00
26700    2015-06-15 19:28:00
26701    2014-06-15 17:41:00
26702    2015-10-11 09:42:00
26703    2015-12-04 12:22:00
26704    2015-11-17 17:17:00
26705    2013-10-19 15:34:00
26706    2014-10-19 13:29:00
26707    2014-07-01 17:06:00
26708    2013-11-13 17:32:00
26709    2015-10-24 00:00:00
26710    2014-11-24 17:21:00
26711    2013-10-30 18:32:00
26712    2015-04-20 16:04:00
26713    2014-01-20 17:37:00
26714    2014-05-31 16:11:00
26715    2015-08-05 17:03:00
26716    2015-05-02 21:04:00
26717    2014-06-30 17:34:00
26718    2015-04-28 14:26:00
26719    2015-07-20 09:00:00
26720    2015-07-18 14:08:00
26721    2014-07-17 09:43:00
26722    2014-08-31 09:00:00
26723    2016-01-29 18:52:00
26724    2015-05-14 11:56:00
26725    2016-01-20 18:59:00
26726    2015-03-09 13:33:00
26727    2014-04-27 12:22:00
26728    2015-07-02 09:00:00
Name: DateTime, dtype: object