These are the packages that I used:

```
In [14]:
```import pandas as pd
import numpy as np
from sklearn.lda import LDA
import datetime
import random

Loading the data:

```
In [2]:
```#specify my working directory:
path = "C:\mycomputer\Bill\{}"
#I put it in this way so that I could just add CSV files as they come in.
myfiles = ["transactions2014.csv",
"transactions2015.csv"]
frames = [pd.read_csv(path.format(x)) for x in myfiles]
df = pd.concat(frames)

Some quick cleaning:

```
In [4]:
```keep_cats = ['Arts&Crafts', 'Coffee', 'Eating at Home', 'Eating out', 'Education', 'Gaming', 'Grocery',
'Lunch', 'Moving', 'Music', 'Out for drinks', 'Out of town',
'Technology', 'Transportation', 'Uncategorized']
df['filters'] = df['Category'].apply(lambda x: x in keep_cats)
df = df[df['filters']]
df = df.dropna().reset_index(drop=True)

```
In [5]:
```df.head()

```
Out[5]:
```

Adding a day of the week:

```
In [6]:
```def get_day_of_week(x):
try:
mydate = datetime.datetime.strptime(x, '%m/%d/%Y')
except:
#I was inconsistent with my datestrings
#Why didn't I just use ISO format!
mydate = datetime.datetime.strptime(x, '%m/%d/%y')
return mydate.strftime('%A')
df['dayOfWeek'] = df['Date of pull'].apply(lambda x: get_day_of_week(x))

Then I'm using the "distance from Saturday" as a proxy for catagorical value

```
In [8]:
```def dist_from_sat(x):
myvalues = {'Friday' :1,
'Monday':2,
'Saturday':0,
'Sunday':1,
'Thursday':2,
'Tuesday':3,
'Wednesday':3}
return myvalues[x]
df['distFromSat'] = df['dayOfWeek'].apply(lambda x: dist_from_sat(x))

```
In [15]:
```df[['distFromSat','dayOfWeek']].ix[random.sample(df.index, 10)]

```
Out[15]:
```

Transform for my model:

```
In [16]:
```X = df.loc[:,['distFromSat','Hour','Amount']].values
y = df.loc[:,'Category'].values

Now I can run my model:

```
In [17]:
```clf = LDA()
clf.fit(X, y)

```
Out[17]:
```

Now see if my model has any validity:

```
In [19]:
```df['predictions'] = clf.predict(X)
accuracy_of_model = len(df[df['predictions'] == df['Category']])/(len(df)*1.)
accuracy_of_random_guess = 1./len(np.unique(y))

```
In [20]:
``````
accuracy_of_model
```

```
Out[20]:
```

```
In [21]:
``````
accuracy_of_random_guess
```

```
Out[21]:
```