As before, we'll start by importing the MovieLens 100K data set into a pandas DataFrame:

```
In [1]:
```import pandas as pd
r_cols = ['user_id', 'movie_id', 'rating']
ratings = pd.read_csv('e:/sundog-consult/udemy/datascience/ml-100k/u.data', sep='\t', names=r_cols, usecols=range(3), encoding="ISO-8859-1")
m_cols = ['movie_id', 'title']
movies = pd.read_csv('e:/sundog-consult/udemy/datascience/ml-100k/u.item', sep='|', names=m_cols, usecols=range(2), encoding="ISO-8859-1")
ratings = pd.merge(movies, ratings)
ratings.head()

```
Out[1]:
```

```
In [2]:
```userRatings = ratings.pivot_table(index=['user_id'],columns=['title'],values='rating')
userRatings.head()

```
Out[2]:
```

```
In [3]:
```corrMatrix = userRatings.corr()
corrMatrix.head()

```
Out[3]:
```

```
In [4]:
```corrMatrix = userRatings.corr(method='pearson', min_periods=100)
corrMatrix.head()

```
Out[4]:
```

```
In [5]:
```myRatings = userRatings.loc[0].dropna()
myRatings

```
Out[5]:
```

Now, let's go through each movie I rated one at a time, and build up a list of possible recommendations based on the movies similar to the ones I rated.

So for each movie I rated, I'll retrieve the list of similar movies from our correlation matrix. I'll then scale those correlation scores by how well I rated the movie they are similar to, so movies similar to ones I liked count more than movies similar to ones I hated:

```
In [6]:
```simCandidates = pd.Series()
for i in range(0, len(myRatings.index)):
print "Adding sims for " + myRatings.index[i] + "..."
# Retrieve similar movies to this one that I rated
sims = corrMatrix[myRatings.index[i]].dropna()
# Now scale its similarity by how well I rated this movie
sims = sims.map(lambda x: x * myRatings[i])
# Add the score to the list of similarity candidates
simCandidates = simCandidates.append(sims)
#Glance at our results so far:
print "sorting..."
simCandidates.sort_values(inplace = True, ascending = False)
print simCandidates.head(10)

```
```

```
In [7]:
```simCandidates = simCandidates.groupby(simCandidates.index).sum()

```
In [8]:
```simCandidates.sort_values(inplace = True, ascending = False)
simCandidates.head(10)

```
Out[8]:
```

```
In [9]:
```filteredSims = simCandidates.drop(myRatings.index)
filteredSims.head(10)

```
Out[9]:
```

There we have it!

Can you improve on these results? Perhaps a different method or min_periods value on the correlation computation would produce more interesting results.

Also, it looks like some movies similar to Gone with the Wind - which I hated - made it through to the final list of recommendations. Perhaps movies similar to ones the user rated poorly should actually be penalized, instead of just scaled down?

There are also probably some outliers in the user rating data set - some users may have rated a huge amount of movies and have a disporportionate effect on the results. Go back to earlier lectures to learn how to identify these outliers, and see if removing them improves things.

For an even bigger project: we're evaluating the result qualitatively here, but we could actually apply train/test and measure our ability to predict user ratings for movies they've already watched. Whether that's actually a measure of a "good" recommendation is debatable, though!

```
In [ ]:
```