(Originally posted on 1UP.com, April 12, 2006.
Sadly, we’re still dealing with this stuff. As far as I can tell, pretty much no one in the games media likes review scores, but Metacritic traffic is so important they can’t afford to not use them.
My opinion has shifted a bit since I wrote this. First, I’m not sure that the goal of a “modal score” is any better than an objective one; I think the best thing a reviewer can do is just give their honest, personal [and subjective] opinion of the game. Second, it really seems like metareview sites do more harm than good to everyone, so I no longer suggest using those for much of anything.)
You may have seen this post by Andrew Pfister. If not, go read it. Basically, 1up has removed the 0.5 increments from its rating scale—games are now rated from 0-10, no decimals. I wholeheartedly support this move, and most reactions I’ve seen are positive. However, there are some people in the comments section on that post who aren’t quite sold. The basic complaint is that the new scale is less accurate, that the reviewers are losing the ability to make meaningful distinctions between levels of quality. Today, I’ve taken it upon myself to respond, as this subject has got me thinking about what exactly a review score is supposed to be.
The complaint is expressed a few ways: If a game that’s really 7.5 gets rounded up to 8, how do you tell it apart from a real 8? How can a reviewer express that a game is a bit better than a typical 9? Ultimately, though, I believe all concerns of this nature come down to two assumptions: First, that each game has a true, objective quality rating—a sort of inherent score that, if known, will tell you whether it is strictly better or worse than any other game. Second, that it’s the job of a reviewer to tell you what this score is.
I absolutely believe that both assumptions are incorrect, but I think it’s interesting to consider what happens if the first is true. Let’s say that each game really does have an objective level of quality, which I’ll call a “true score.” If you knew the true score of every game, you could create a ranking of all games, from best to worst, that no one could realistically question. Everyone has their own tastes and preferences, but despite this, some games are simply better than others in an objective sense. Now, should the score in a game review match this true score? Ideally yes, because then the review would be perfectly accurate. The only problem is that this is impossible. As much as a game reviewer may strive to be objective, every human is bound by biases and predispositions built up over their lifetime. No one can really push aside all personal feelings and judge a game in a manner that everyone will agree is fair, no matter how hard he or she may try. Thus, the best a reviewer can do is estimate the true score, making as good a guess as the subjective mind will allow. There’s always error inherent in an estimate, and as a basic science class will tell you, there’s little use for precision without accuracy. In other words, it makes no difference to rate a game 7.5 instead of 8 when this estimate could be off from the true score by a whole point or more. A rating scale with fewer levels recognizes that scores are just educated guesses. No one can say for sure whether one game rated 8 is really better than another, so trying to distinguish them with decimals doesn’t add any benefit.
That’s all assuming that a true score exists. As I said, I don’t think this is true—it’s an interesting theoretical concept, but I believe the variance of opinion among gamers means it can’t actually exist in any meaningful sense. When it comes to art (or media, or entertainment, if you find the A-word too contentious), you can’t say that one thing is objectively better than another, because value is determined on an individual basis by those observing it. From this perspective, the job of the reviewer is different. Instead of trying to guess at the true score, he or she needs to come up with something like a modal score (in the mathematical sense)—one that, overall, will be mostly correct to most readers. Personal opinions will always vary, of course, but a modal score is most helpful by predicting the quality that most people will observe in the game. But this, too, is an estimate—one can’t say for sure how other people will react to a game, and there are again personal biases at play. As before, a broader rating scale is still better, because it accounts for the error in estimating this score.
This is why a scale with fewer positions is better—the error in rating a game is simply too great for finer grading to give any meaningful information. In other words, 1up’s new scale doesn’t round 7.5 to 8—it says that there is no 7.5, and all games rated 8 are roughly equal. You may like some better than others, but that doesn’t mean this is the objective truth, or that everyone else will agree. Really, I think the scale could stand to be even simpler, perhaps 1 to 5 stars with no halves. Even now I feel that the distinction between, say, 7 and 8 isn’t terribly meaningful, and the scale should be broken down to levels so broad that the jump between each represents a truly noticeable change in quality. I would only caution against making them too broad, such as with a 2-point scale—this essentially gives you ratings of “good” and “bad,” which looks too much like the objectivity I already argued was impossible to achieve. There will never be a rating that everyone agrees on, and it’s not helpful if this disagreement always swings the score from one end of the scale to the other.
If you’re like me, reading all that may make you realize that the whole concept of scoring games is kind of ridiculous. As I said, everyone has different tastes, and different people will get radically different things out of a game. Really, attempting to sum up a game in a single number is doing a great disservice to the game and the readers, because this gives absolutely no indication of why the game is good or bad. The only way to figure out whether you’ll really enjoy a game is to look at the other part of the review—you know, the text, which can actually explain why the game may or may not suit your preferences. All a score does is distract people from the rest of the review, the part that really matters.
Unfortunately, Andrew clearly states in his post that we’re not going to see scores go away anytime soon, even though he too would like to be rid of them. Part of the reason is sites like Game Rankings, which average the scores from a variety of sources to give an overall ranking for a game. The funny thing is that, as much as I’d like to see scores gone from individual reviews, I actually think Game Rankings is pretty useful. By aggregating a number of scores, it essentially smooths out the biases of the reviewers. This gives you something closer to the true score, if you believe in that, or at least a better idea of how people in general may feel about the game. This is still no replacement for reading the actual reviews, since one’s own opinion may skew wildly from the average, but it’s a good place to start. Game Rankings gives you a good estimate of the game’s “objective” quality, better than any individual review—you just need to read the individual reviews afterward to determine how the game may appeal to your own subjective tastes.
What I’d really like—and I recognize this is in no way realistic—is for 1up to print no score in its reviews, but report a number directly to Game Rankings to be recorded in the average. Since things don’t work that way, I’m going to keep doing what I’ve been doing, which means reading the review text, paying little attention to the score, and turning to Game Rankings when I really want an overall number. The great thing about scores, even as they are now, is that you don’t have to care about them if you don’t want to. Personally, I recommend that you don’t.
Not really a sidebar: Subjective reviews
Inherent in all this talk of scores is the assumption that reviewers are trying to be as objective as they can, and come up with a score that makes sense to as many readers as possible. A reviewer could instead create an intentionally subjective review, giving only his or her own opinion with no concern for how others may perceive the game—I guess this is what they call New Games Journalism. In this case, I figure the reviewer can come up with whatever scoring system he or she wants, since any score isn’t necessarily supposed to apply to the readers anyway. This isn’t really relevant to the main point of this post, but I thought I should mention it in case anyone was wondering about it.