by Marshall Flores
Willkommen zurück to Awards Daily’s Statsgasm. This is the 5th and last of the planned episodes I have for Statsgasm’s first run. But don’t be sad, friends – more entries will pop up if the occasion calls for it! I will certainly be reporting the results of my number crunching in future posts as we approach Oscars night on March 2nd.
Before we begin, I invite you to review Episode 3 (http://www.awardsdaily.com/blog/statsgasm-week-three/) to refresh your memory about the basics of logistic regression. This episode will not be as technically involving as Episode 3 was, but everything in here snowballs off the contents of that post (as well as my supplementary writing in the comments section). If your memory is still foggy from a holidays-induced hangover, it’s in your best interest to take a gander back in time. Otherwise, you may get irrevocably lost in a statistical blizzard, and we certainly don’t want that!
Today, I will discuss two issues I’ve encountered when building AD’s prediction models. By doing so, I hope to shed light on some of the dilemmas any would-be Nate Silver will encounter, and demonstrate what I’ve always maintained since the beginning – statistical prediction involves personal art as well as science. Along with this discussion, I will also formally introduce one of AD’s forecasting models in its *entirety.* So stick around, friends, because this will be the one time I will reveal everything that’s inside one of the magical mystery boxes.
As we learned in Episode 3, since we can model Oscar outcomes as binary variables, logistic regression is an ideal starting point for creating an Oscar forecasting model. I introduced the most common form of logistic regression, the logit model, in that episode. Under certain conditions, e.g. a large enough sample size, the logit model works perfectly fine.
In general, we prefer to run regression models with larger sample sizes than smaller ones – larger samples lead to more accurate estimates. But sometimes a large sample size is a luxury we simply cannot have. For instance, if we wish to include the BAFTA as a predictor in one of our models (and the BAFTA does feature as a significant predictor in 16 of AD’s models), our sample size is limited to just 13 years worth of data. Why? Because 2000 was the year when the BAFTAs were moved up to occur *before* the Oscars. Before the date change, the BAFTAs always took place a few weeks *after* the Oscars had concluded. In other words, the BAFTAs can only be considered an Oscars precursor since 2000.
Small sample sizes will cause a few headaches with the normal logit model, the biggest being that the logit model will overestimate the effects of predictor variables, potentially leading to biased estimates. Biased estimates impair the ability to make accurate predictions.
Fortunately, people way smarter than I have invented alternate methods of logistic regression that mitigate the effects of a small sample size. The specific variant that AD uses in its prediction models is something called the penalized likelihood model, also known as Firth’s Method. The Firth method basically makes more conservative estimates than the standard logit model, making it very suitable for small samples.
So let’s take an in-depth look at one of the models AD will be using this season. It took me a while to figure out exactly which model I wanted to present; I wanted the model to be fairly simple, but also for a category that I think people would be especially interested in cracking. Ultimately, as a musician who considers himself as a *huge* film score buff, I’ve decided to unveil AD’s Original Score model!
Our Original Score model consists of four binary predictor variables. They are:
first_nom – indicates if this is the nominated composer’s first Score nomination. 8 of the past 13 Original Score winners since 2000 have been, ahem, Oscar virgins. To me, this is a very interesting trend as the Academy’s composing branch is notoriously very insular, but it seems AMPAS as a whole likes to award first-timers in this category.
BP_nom – indicates if the nominee’s film is also up for Best Picture. 12 of the past 13 winners were scores in BP-nominated films.
Globe – indicates if the nominee won the Golden Globe for Best Original Score. This is the strongest predictor for the Oscar – the Globe and Oscar have now matched 6 years straight in this category, and the Globe was the only precursor that foreshadowed Elliot Goldenthal’s win for Frida in 2002.
BAFTA – indicates if the nominee won the BAFTA Anthony Asquith Award for Film Music. The weakest of the predictors for Oscar, but still significant. Gustavo Santaolalla’s win here in 2006 for Babel heralded his 2nd, back-to-back Oscar win.
The regression table for our model:
Again, we interpret the approximate weights of the predictors in terms of odds. Winning the Globe, for example, is estimated (on average) to make winning the Original Score Oscar 48 times more likely.
Before I show how well the Original Score model performs with past data, I will now talk about the 2nd of two issues I’ve encountered in building AD’s prediction models. Logistic regression assumes each observation is independent when making estimates. So in this context, the model doesn’t recognize that there are 4 other nominees in the Score lineup, and that there can *only* be one winner (in general). As a result, the model could calculate multiple nominees having predicted win probabilities of at least 50% – *all* nominees meeting 50% will be predicted to win by the model. Similarly, it’s also possible the model will estimate win probabilities of less than 50% for all nominees as well, resulting in *no* predicted winner.
To address this statistical kink, I manually re-normalize the raw probabilities the model computes by dividing each nominee’s raw probability by the cumulative probability of the entire lineup. Gibberish, right? Well, here’s a straightforward example: let’s say there were only two nominees – one was estimated to have a 90% chance to win the Oscar, the other a 60% chance (and both are predicted by the model to be winners). After adjustment, the nominee with the higher raw probability is now rated to have a 0.9 / (0.9 + 0.6) = 60% chance of winning, while the other would have a 0.6 / (0.9 + 0.6) = 40% chance. I will manually select the nominee with the highest adjusted probability as the model’s predicted winner.
Here is how AD’s Original Score model performs over the past 13 years:
(One minor note, another adjustment I perform is that I ensure all nominees have, at minimum, a 1% probability of winning the Oscar. Many times the model will estimate a nominee has a 0% chance of winning, but given that we have don’t have the best idea of how AMPAS voters actually behave, this can’t be the case. There’s *always* the potential for surprises.)
Hopefully the chart is fairly self-explanatory. In the adjusted win probabilities column, the cells highlighted in green display the model’s predicted winners, the yellow cells indicate the nominees that have the potential to upset (basically, I’ll highlight anything that has an adjusted probability of at least 10%), and the red cells indicate when the model did not correctly predict the winner. In the end, our Original Score model correctly predicts 11 of the past 13 winners, missing in 2002 and 2006 (although in both cases the nominee with the 2nd highest adjusted probability ended up winning).
And there you have it, AD’s Original Score model in all of its uninhibited, birthday suit glory, ready to be fed new data in order to make its forecasts. AD’s prediction models for other categories are similar in construction. Some are simpler, others are more complex.
With that I conclude Episode 5 and the Statsgasm mini-series. I will be very happy to receive your questions and feedback in comments below, by email (marshall(dot)flores(at)gmail(dot)com), or on Twitter at @IPreferPi314. Statsgasm will return on January 16th to crunch some initial predictions right after Oscar nominations are announced.
Ta-ta for now!