This is the second installment of a sequence on recommendation systems. The first is on smoothed (dampened) average ratings and can be found here.
How do we compare a movie with a single 5-star rating to one with many 4 and 5 ratings?
Say we want to take user ratings of movies and rank them. A simple approach would be to just look at the average rating of each movie, but that doesn’t feel right.. different movies have different amounts of ratings, and we don’t want small samples to skew our estimate of the ‘’true’’ rating that would arise if the whole population rated that movie. As suggested by Evan Miller, using the Lower Confidence Bound ratings is one approach to handle this. This is another non-personalized recommendation system. Let’s start by looking at the highest rated movies from the MovieLens dataset on Kaggle.
Code
# import the data# ratings doesn't include movie names so merge with ids to get namesratings = pd.read_csv("archive/rating.csv", parse_dates=['timestamp'])ids = pd.read_csv("archive/movie.csv")ratings = pd.merge(ratings, ids, on='movieId', how='left')# Find each movie's mean ratingavg_ratings = ratings.groupby(['movieId', 'title'])['rating'].agg( avg_rating='mean', rating_count='count')avg_ratings.sort_values(by ='avg_rating', ascending=False, inplace =True)avg_ratings.head()
movieId
Title
Mean Rating
Number of Ratings
117314
Neurons to Nirvana (2013)
5.0
1
117418
Victor and the Secret of Crocodile Mansion (2012)
5.0
1
117061
The Green (2011)
5.0
1
109571
Into the Middle of Nowhere (2010)
5.0
1
109715
Inquire Within (2012)
5.0
1
Not quite the blockbusters or classics we were expecting.. We’ll soon see that there are a bunch of movies that had just one rating with an average of 5 stars. We already did a touch of EDA in the previous post, so let’s dive right into the rating method.
The latter is meant for binary data, but we can tweak the adapt the technique and convert our ratings (which range from 0.5 to 5) to a fractional amount of positive or negative votes. Specifically, a 0.5 rating will be 1 negative vote and no positive votes, a 1 star rating is 1/9 of a positive vote and 8/9 of a negative vote, and so on.
Rating (Stars)
Positive votes
Negative votes
0.5
0
1
1
0.111
0.888
1.5
0.222
0.777
2
0.333
0.666
2.5
0.444
0.555
3
0.555
0.444
3.5
0.666
0.333
4
0.777
0.222
4.5
0.888
0.111
5
1
0
Bernoulli Confidence Interval
The idea here is that instead of taking the rating to be the mean over the \(N_j\) ratings of movie \(j\), (i.e. \(r_j = \sum_i X_i/N_j\)), we can look at a confidence interval of a certain level of confidence \(1-\alpha\) and take its lower bound to be our “rating.” Let
\(N_j\) the number of ratings of movie \(j\)
\(\bar{X} = \frac{1}{N_j} \sum_i X_i\), and
\(t_{\alpha/2}\) be the \(1-\alpha/2\) quantile of the Student’s t-distribution with \(N_j - 1\) degrees of freedom
This approach will assuage the issue of overrated movies being recommended. The equation above is a biased estimator of the true underlying rating; it underrates all movies, with the bias being less severe for movies with more reviews. Movies with few reviews will have a wider confidence interval, so their rating will tend to be lower. As a movie gets more ratings, its confidence interval will narrow and be closer to the true rating. We’ll apply the method with \(\alpha = 0.05\), though this is a hyperparameter that can be tuned.
Code
# Step 1: Define a function to compute the CIdef compute_confidence_interval(group, confidence=0.95): n = group.count() mean = group.mean() std_err = stats.sem(group, nan_policy='omit') h = std_err * stats.t.ppf((1+ confidence) /2., n-1) if n >1else0 lower = mean - h upper = mean + hreturn pd.Series({'avg_rating': mean,'rating_count': n,'ci_lower': lower,'ci_upper': upper })
Code
# Apply the function after groupingCI_ratings = ratings.groupby(['movieId', 'title'])['rating'].apply(compute_confidence_interval).unstack().reset_index()
Code
# Sort by the ratings and take a look at the top ranked moviesCI_ratings.sort_values(by ='avg_rating', ascending=False, inplace =True)CI_ratings.head()
Title
Mean Rating
Number of Ratings
CI Lower Bound
CI Upper Bound
Black Girl (La noire de…) (1966)
5.0
1.0
5.0
5.0
Hatful of Rain, A (1957)
5.0
1.0
5.0
5.0
Perfect Plan, A (Plan parfait, Un) (2012)
5.0
1.0
5.0
5.0
Rebels of the Neon God (Qing shao nian nuo zha) (1992)
5.0
1.0
5.0
5.0
Examined Life (2008)
5.0
1.0
5.0
5.0
Hmm.. still not what we might expect from the “top” movies. Keen observers will notice that when \(N_j = 1\), our confidence interval reduces to a point, so the movies with solely one 5-star rating will have their lower confidence bound be equal to 5. We can try to work around this by only looking at movies with at least a certain number of ratings, or movies with at least two different types of ratings. When we require at least two ratings, we still have the same issue as before:
Title
Mean Rating
Number of Ratings
CI Lower Bound
CI Upper Bound
Consuming Kids: The Commercialization of Childhood (2008)
5.0
2.0
5.0
5.0
Catastroika (2012)
5.0
2.0
5.0
5.0
Memorial Day (2011)
4.5
2.0
4.5
4.5
Orderers (Les ordres) (1974)
4.5
2.0
4.5
4.5
Ankur (The Seedling) (1974)
4.5
3.0
4.5
4.5
Once we raise the minimum amount of reviews to 5, this issue goes away:
Title
Mean Rating
Number of Ratings
CI Lower Bound
CI Upper Bound
Shawshank Redemption, The (1994)
4.447
63366.0
4.441
4.453
Godfather, The (1972)
4.365
41355.0
4.357
4.373
Usual Suspects, The (1995)
4.334
47006.0
4.328
4.341
Schindler's List (1993)
4.310
50054.0
4.303
4.317
Godfather: Part II, The (1974)
4.276
27398.0
4.265
4.286
Rear Window (1954)
4.271
17449.0
4.260
4.283
Seven Samurai (Shichinin no samurai) (1954)
4.274
11611.0
4.259
4.289
Casablanca (1942)
4.258
24349.0
4.247
4.269
One Flew Over the Cuckoo's Nest (1975)
4.248
29932.0
4.239
4.257
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)
4.257
6525.0
4.237
4.277
Wilson Score Interval
Using the t-distribution has other technical issues, and the “correct” way of computing the interval is using the Wilson Score Interval. Again, the idea here is that instead of taking the rating to be the mean over the \(N_j\) ratings of movie \(j\), (i.e. \(r_j = \sum_i X_i/N_j\)), we can look at a confidence interval of a certain level of confidence \(1-\alpha\) and take its lower bound to be our “rating.” Let
\(N_j\) the number of ratings of movie \(j\)
\(\bar{X} = \frac{1}{N_j} \sum_i X_i\), and
\(z_{\alpha/2}\) be the \(1-\alpha/2\) quantile of the standard normal distribution, and .
This approach will assuage the issue of overrated movies being recommended. The equation above is a biased estimator of the true underlying rating; it underrates all movies, with the bias being less severe for movies with more reviews. Movies with few reviews will have a wider confidence interval, so their rating will tend to be lower. As a movie gets more ratings, its confidence interval will narrow and be closer to the true rating. We’ll apply the method with \(\alpha = 0.05\), though this is a hyperparameter that can be tuned.
Code
# Convert ratings to fractional positive (0 to 1 scale)ratings['fractional_positive'] = (ratings['rating'] -0.5) /4.5def wilson_score_interval(pos, n, confidence=0.95):if n ==0:return (0.0, 0.0) z = norm.ppf(1- (1- confidence) /2) phat = pos / n denominator =1+ z**2/ n centre = phat + z**2/ (2*n) margin = z * np.sqrt((phat * (1- phat) + z**2/ (4*n)) / n) lower = (centre - margin) / denominator upper = (centre + margin) / denominatorreturn lower, upperdef compute_fractional_wilson(group): n = group['fractional_positive'].count() pos = group['fractional_positive'].sum() lower, upper = wilson_score_interval(pos, n)return pd.Series({'fractional_positive_mean': pos / n if n >0else np.nan,'rating_count': n,'wilson_lower': lower,'wilson_upper': upper })# Group by movieId and namewilson_CI_ratings = ratings.groupby(['movieId', 'title']).apply(compute_fractional_wilson).reset_index()
Code
wilson_CI_ratings.head()
Code
# Our top movieswilson_CI_ratings.sort_values(by ='wilson_lower', ascending=False, inplace =True)wilson_CI_ratings.head(10)
Title
LCB Rating
Rating Count
Fractional Positive Mean
Wilson Lower
Wilson Upper
Shawshank Redemption, The (1994)
4.435
63366.0
0.877
0.875
0.880
Godfather, The (1972)
4.349
41355.0
0.859
0.855
0.862
Usual Suspects, The (1995)
4.320
47006.0
0.852
0.849
0.855
Schindler's List (1993)
4.296
50054.0
0.847
0.844
0.850
Godfather: Part II, The (1974)
4.256
27398.0
0.839
0.835
0.843
Rear Window (1954)
4.246
17449.0
0.838
0.833
0.843
Seven Samurai (Shichinin no samurai) (1954)
4.244
11611.0
0.839
0.832
0.845
Casablanca (1942)
4.237
24349.0
0.835
0.830
0.840
One Flew Over the Cuckoo's Nest (1975)
4.229
29932.0
0.833
0.829
0.837
Dr. Strangelove or: How I Learned to Stop Worrying and Love the Bomb (1964)
4.225
23220.0
0.833
0.828
0.837
The top 5 using this method are the same as the top 5 that we saw from the dampened mean approach for higher values of \(\lambda\), but the next 5 differ more with Casablanca being the only one on both lists. The dampened mean’s approach’s top 10 is the table below:
Title
Dampened Mean
Ratings Sum
Number of Ratings
Shawshank Redemption, The (1994)
4.447
281788.0
63366
Godfather, The (1972)
4.365
180503.5
41355
Usual Suspects, The (1995)
4.334
203741.5
47006
Schindler’s List (1993)
4.310
215741.5
50054
Godfather: Part II, The (1974)
4.275
117144.0
27398
Seven Samurai (Shichinin no samurai) (1954)
4.274
49627.5
11611
Rear Window (1954)
4.271
74530.5
17449
Band of Brothers (2001)
4.262
18353.0
4305
Casablanca (1942)
4.258
103686.0
24349
Sunset Blvd. (a.k.a. Sunset Boulevard) (1950)
4.256
27776.5
6525
Finally, here’s a visualization that has both the original rating (blue) and the lower confidence bound (LCB) rating (orange) with \(\alpha = .05\), as well as the difference in the ratings (gray line). The visualization is only for a sample of 50 movies so it doesn’t get too cluttered. You can see that the movies with fewer ratings are generally pulled further towards the left, and those with more ratings have generally budged less, but this pattern isn’t a guarantee by this method. A movie with few ratings could be consistently rated and have a narrower confidence interval than a movie that has been rated more times with inconsistent ratings.
Further approaches and questions
Stay tuned for future posts on other recommender systems where I’ll answer questions such as
How does one select the hyperparameter \(\alpha\)?