# Statsgasm Episode 5: the Art of Prediction

# by Marshall Flores

*Willkommen zurück* to Awards Daily’s Statsgasm. This is the 5th and last of the planned episodes I have for Statsgasm’s first run. But don’t be sad, friends – more entries will pop up if the occasion calls for it! I will certainly be reporting the results of my number crunching in future posts as we approach Oscars night on March 2nd.

Before we begin, I invite you to review Episode 3 (http://www.awardsdaily.com/blog/statsgasm-week-three/) to refresh your memory about the basics of logistic regression. This episode will not be as technically involving as Episode 3 was, but everything in here snowballs off the contents of that post (as well as my supplementary writing in the comments section). If your memory is still foggy from a holidays-induced hangover, it’s in your best interest to take a gander back in time. Otherwise, you may get irrevocably lost in a statistical blizzard, and we certainly don’t want that!

Today, I will discuss two issues I’ve encountered when building AD’s prediction models. By doing so, I hope to shed light on some of the dilemmas any would-be Nate Silver will encounter, and demonstrate what I’ve always maintained since the beginning – statistical prediction involves personal art as well as science. Along with this discussion, I will also formally introduce one of AD’s forecasting models in its *entirety.* So stick around, friends, because this will be the one time I will reveal everything that’s inside one of the magical mystery boxes.

As we learned in Episode 3, since we can model Oscar outcomes as binary variables, logistic regression is an ideal starting point for creating an Oscar forecasting model. I introduced the most common form of logistic regression, the **logit** model, in that episode. Under certain conditions, e.g. a large enough sample size, the logit model works perfectly fine.

In general, we prefer to run regression models with larger sample sizes than smaller ones – larger samples lead to more accurate estimates. But sometimes a large sample size is a luxury we simply cannot have. For instance, if we wish to include the BAFTA as a predictor in one of our models (and the BAFTA does feature as a significant predictor in **16** of AD’s models), our sample size is limited to just **13 years** worth of data. Why? Because 2000 was the year when the BAFTAs were moved up to occur *before* the Oscars. Before the date change, the BAFTAs always took place a few weeks *after* the Oscars had concluded. In other words, the BAFTAs can only be considered an Oscars precursor since 2000.

Small sample sizes will cause a few headaches with the normal logit model, the biggest being that the logit model will overestimate the effects of predictor variables, potentially leading to biased estimates. Biased estimates impair the ability to make accurate predictions.

Fortunately, people way smarter than I have invented alternate methods of logistic regression that mitigate the effects of a small sample size. The specific variant that AD uses in its prediction models is something called the **penalized likelihood model**, also known as **Firth’s Method**. The Firth method basically makes more conservative estimates than the standard logit model, making it very suitable for small samples.

So let’s take an in-depth look at one of the models AD will be using this season. It took me a while to figure out exactly which model I wanted to present; I wanted the model to be fairly simple, but also for a category that I think people would be especially interested in cracking. Ultimately, as a musician who considers himself as a *huge* film score buff, I’ve decided to unveil AD’s Original Score model!

Our Original Score model consists of four binary predictor variables. They are:

*first_nom* – indicates if this is the nominated composer’s first Score nomination. **8 of the past 13** Original Score winners since 2000 have been, ahem, Oscar virgins. To me, this is a very interesting trend as the Academy’s composing branch is notoriously very insular, but it seems AMPAS as a whole likes to award first-timers in this category.

*BP_nom* – indicates if the nominee’s film is also up for Best Picture. **12 of the past 13** winners were scores in BP-nominated films.

*Globe* – indicates if the nominee won the Golden Globe for Best Original Score. This is the strongest predictor for the Oscar – the Globe and Oscar have now matched **6 years straight** in this category, and the Globe was the only precursor that foreshadowed Elliot Goldenthal’s win for Frida in 2002.

*BAFTA* – indicates if the nominee won the BAFTA Anthony Asquith Award for Film Music. The weakest of the predictors for Oscar, but still significant. Gustavo Santaolalla’s win here in 2006 for Babel heralded his 2nd, back-to-back Oscar win.

The regression table for our model:

Again, we interpret the approximate weights of the predictors in terms of odds. Winning the Globe, for example, is estimated (on average) to make winning the Original Score Oscar **48 times more likely**.

Before I show how well the Original Score model performs with past data, I will now talk about the 2nd of two issues I’ve encountered in building AD’s prediction models. Logistic regression assumes each observation is independent when making estimates. So in this context, the model doesn’t recognize that there are 4 other nominees in the Score lineup, and that there can *only* be one winner (in general). As a result, the model could calculate multiple nominees having predicted win probabilities of at least 50% – *all* nominees meeting 50% will be predicted to win by the model. Similarly, it’s also possible the model will estimate win probabilities of less than 50% for all nominees as well, resulting in *no* predicted winner.

To address this statistical kink, I manually re-normalize the raw probabilities the model computes by dividing each nominee’s raw probability by the cumulative probability of the entire lineup. Gibberish, right? Well, here’s a straightforward example: let’s say there were only two nominees – one was estimated to have a **90%** chance to win the Oscar, the other a **60%** chance (and both are predicted by the model to be winners). After adjustment, the nominee with the higher raw probability is now rated to have a 0.9 / (0.9 + 0.6) = **60%** chance of winning, while the other would have a 0.6 / (0.9 + 0.6) = **40%** chance. I will manually select the nominee with the **highest adjusted probability** as the model’s predicted winner.

Here is how AD’s Original Score model performs over the past 13 years:

(One minor note, another adjustment I perform is that I ensure all nominees have, at minimum, a 1% probability of winning the Oscar. Many times the model will estimate a nominee has a 0% chance of winning, but given that we have don’t have the best idea of how AMPAS voters actually behave, this can’t be the case. There’s *always* the potential for surprises.)

Hopefully the chart is fairly self-explanatory. In the adjusted win probabilities column, the cells highlighted in green display the model’s predicted winners, the yellow cells indicate the nominees that have the potential to upset (basically, I’ll highlight anything that has an adjusted probability of **at least 10%**), and the red cells indicate when the model did not correctly predict the winner. In the end, our Original Score model correctly predicts **11 of the past 13 winners**, missing in 2002 and 2006 (although in both cases the nominee with the 2nd highest adjusted probability ended up winning).

And there you have it, AD’s Original Score model in all of its uninhibited, birthday suit glory, ready to be fed new data in order to make its forecasts. AD’s prediction models for other categories are similar in construction. Some are simpler, others are more complex.

With that I conclude Episode 5 and the Statsgasm mini-series. I will be very happy to receive your questions and feedback in comments below, by email (marshall(dot)flores(at)gmail(dot)com), or on Twitter at @IPreferPi314. Statsgasm will return on January 16th to crunch some initial predictions right after Oscar nominations are announced.

Ta-ta for now!

Slightly to the point, I have no idea what’s winning Score this year. This is such a rarity for no strong frontrunner.

Off-topic:

An Open Letter from Martin Scorsese to his Daughter

http://espresso.repubblica.it/visioni/2014/01/02/news/martin-scorsese-a-letter-to-my-daughter-1.147512

Slightly to the point, I have no idea what’s winning Score this year. This is such a rarity for no strong frontrunner.This score buff is predicting Gravity. Steven Price’s magnificent work is heads and shoulders above the rest.

Marshall – I would like to thank you for all your contribution to this site. Your essays were like an engineering thesis!

I would like to ask you one question. Is there any movie that won Best Picture Oscar without a Golden Globes best director nod? I believe there is a very high correlation and maybe you might want to look into that more closely!!

Marshall – There are three movies that received nods for Golden Globes BP + Best Director + Screenplay.

12 Years a Slave

Nebraska

American Hustle

The BP Oscar winner will be one of the films above – yes, we have only three candidates!!

Almost every BP Oscar winner film ticked the boxes of these three categories. Could you please look into this as well?

Hi Sammy, and thanks! As for your question, the Sting, Chariots of Fire, and Driving Miss Daisy all won BP without a Globe nomination for Director. However, although there is a high correlation between having this particular nod and BP, it’s actually not a very useful one when analyzed through the prism of logistic regression analysis.

It’s kinda hard to explain, but I’ll give it a shot – *many* BP nominees tend to have a Globe Directing nod, but since there’s only one BP winner in any given year, the positive predictive effect of the Globe (as estimated by the model) gets diluted because there are effectively multiple losers that had the nomination too. This is also the case when attempting to use something like having an Oscar nomination Best Director/Screenplay as a predictor – anything that is commonly shared among most BP nominees generally isn’t found to be a significant predictor. Such is the nature of maximum likelihood estimation – the underlying mathematical process in which the logistic regression model makes its approximations.

Almost every BP Oscar winner film ticked the boxes of these three categories. Could you please look into this as well?I did consider these Globe categories when building AD’s BP model, and comparatively speaking, they aren’t the best predictors of BP. Remember though that I’m speaking in a statistical context – just because the analysis I’m using finds these correlations to be not significant doesn’t mean they’re not significant *period*. They’re still useful heuristics.

“To me, this is a very interesting trend as the Academy’s composing branch is notoriously very insular, but it seems AMPAS as a whole likes to award first-timers in this category.”

This is a false conclusion. There’s no evidence that they “prefer” first-timers (the films are listed on the ballot, not the composers, and how many voters keep track of first time nods or not? Likely very few). This is correlation, not behavioral causality.

I think what it does suggest is that there’s no reason to believe that name-recognition plays a factor in winning–that they don’t gravitate toward a well-known name just for its own sake, but factor other things when deciding for whom to vote.

Great statistical series, btw! Very interesting. B*)

This is a false conclusion. There’s no evidence that they “prefer” first-timers (the films are listed on the ballot, not the composers, and how many voters keep track of first time nods or not? Likely very few). This is correlation, not behavioral causality.Yes I don’t mean to imply any causation on this trend – I’m well aware the composers don’t have their names listed on the ballot. I’m just noting the correlation. Hence my use of the word “seems”

How soon before you take a shot at some predictions for this year?

How many days after the Oscar nominations?

How soon before you take a shot at some predictions for this year?I’ll be crunching out some initial predictions immediately after nominations are announced.