Tuesday, March 29, 2016

Who Rates the Raters?


An interesting question which arises in many fields is the comparison of multiple ratings or rankings of the same items. Imagine that several real estate appraisers assess a number of properties. Questions of interest include: How similar are the ratings, across appraisers? On which items are the ratings most different? Why? Is there some way to consolidate differing ratings? Such questions arise when considering such diverse raters as movie critics, medical doctors, and investment advisors, and a variety of quantitative techniques have been developed to and these questions.

In his Mar-16-2016 posting, "The 25 Best Teams In The Tournament (Say Stat Geeks)", to the TeamRankings Web site, Seth Trachtman explores some inter-rater issues surrounding college basketball rankings. In the table at the bottom of his post, he include rankings from 5 different basketball experts for the 25 highest-ranked teams (out of a field of many more teams).

While there are quite a few correlation measures we might apply to pairs of raters, this posting examines the provided data using techniques described in my previous postings, Principal Components Analysis and Putting PCA to Work.


The table from Trachtman's article contains 5 columns, representing 5 different basketball raters, and 25 rows, representing 25 different NCAA teams. The raters are Trachtman's own TeamRankings.com, Ken Pomeroy, ESPN's Basketball Power Index, the ominously named Prediction Machine and LRMC from Georgia Tech:

After standardizing the original data, principal components analysis (PCA) reveals 5 new variables (the principal components), which are transformed versions of the original rankings. These new variables exhibit the following cumulative variances:

PCs   Variance
1         0.8816
2         0.9522
3         0.9820
4         0.9920
5         1.0000

We interpret this  summary to mean that the first principal component- note: a single variable- contains 88% of the total statistical variance in the original table. Clearly, there is substantial inter-rater agreement about these 25 teams. Further, retaining the first two principal components contains 95% of the total variance, 98% is contained in the first three, etc.

Reconstructing the rankings using only 3 principal components yields a new table in which 86% of rankings are within 1 (either direction) of the originals, and the maximum error (which occurs only 3 times in the entire table) has the new ranks off by 3 from the original. What this means is that the original data could pretty closely be approximated by a mixture of 3 artificial raters (the first 3 principal components).

It's interesting to note that the first principal component comes up with almost exactly the same ordering as Trachtman's Consensus, which is the mean ranking across all 5 raters (Miami and Texas A&M, which are tied in the Consensus are given a definite order by the first principal component). In situations like this one, the first principal component often ends up as a weighted average of the original data. In this case, the first principal component gives slightly more weight to Ken Pomeroy and LRMC.  Also of note, the second principal component treats TeamRankings, ESPN and LMRC as one group, and Ken Pomeroy and the Prediction Machine as another, based on the signs of its coefficients.


While these findings are notable, the table provided on TeamRankings gives a limited window into their total data set. It'd be interesting to repeat the above analysis with at least the 50 best ranked teams. Likewise, adding more raters might produce more surprising results from using the first principal component as an alternative to the Consensus.

No comments: