# Statsgasm – Week Three

# by Marshall Flores

Welcome to Episode 3 of Awards Daily’s Statsgasm. Last week I led a descent into deeper statistical madness by introducing regression analysis; hopefully I wasn’t Darryl Revok and didn’t induce too many *Scanners*-style exploding heads among you all. Today we’re going to make our furthest probe into the statistical singularity of regression. Yes, that means more math, but we will also see for the first time the primary method that forms the basis of AD’s Oscar prediction models.

Note: before you venture past this point, I do invite you to take the time to read and/or review my first two Statsgasm posts. I of course try to make each post self-contained, but advanced stats (and math in general) does snowball from basic concepts and terminology. Revisiting the first two episodes of Statsgasm may be useful in ensuring that you don’t get too lost today, as I believe this will be the longest and most technical episode in the entire series.

The **simple linear regression (SLR)** model I introduced in Episode 2 is a very powerful tool, but it’s appropriate for only a certain type of data, specifically, when the response variable is **continuous** (i.e. the response can assume *any* value on the real number line). Many things can be represented by continuous variables, but not everything. For instance, if something only has two possible outcomes (e.g. whether a student passes or fails a course), it makes much more sense to use a **binary** (0 for failure, 1 for pass) variable to model it.

So what do binary variables have to do with us? Hmm, lets see… oh yeah, the Oscars are a textbook example of something that can be coded as a binary variable!! Each category only has one winner – the rest of the nominees don’t win. So we simply assign a 1 to a winner and a 0 to the rest.

However, if we do want to use a binary variable as a response, we cannot use the SLR model (for a number of theoretical reasons I won’t go over). Fortunately, we can use a kissing cousin of linear regression called logistic regression. The underlying mechanics of **logistic regression** are difficult to explain without explicit math, and since I’m interested in writing for you guys without exploding your heads, I’m going to handwave a lot of the formalities and focus on showcasing what the model can do. Still, I do want to emphasize one very important thing that distinguishes logistic regression from SLR and makes it a very appropriate method for us to use regarding the Oscars:

The SLR model estimates an average change in the response variable given a change in our predictor(s). Logistic regression effectively does the same thing, but it now calculates how the binary response variable is influenced by the predictors in terms of probability (more specifically, it calculates probabilities when the response variable equals 1). To put it in the context of Oscar predicting, logistic regression can enable us to determine how certain precursors (winning the DGA, having the most nominations, etc.) affect, on average, the chances of winning the Oscar.

With my introduction of logistic regression as a magical mystery box out of the way, let’s see it perform some Abracadabra! Last week we used an SLR model to examine how Best Picture winners with more nominations tend to also take home more Oscars as a result. We’re now going to build a logistic regression model that estimates how the number of nominations a BP nominee receives influences its chances of winning BP. I’ll also throw in the strongest BP predictor historically, the DGA, into the mix as well.

For this example, we’re going be using data from 1950-2012. First things first – always visualize the data we’re working with. A histogram that shows the distribution of total nominations each BP nominee received in that period:

This distribution doesn’t look as skewed compared to the ones we saw using data from 1980-2012 – in fact, it actually looks pretty normal. Let’s see what our logistic regression model comes up with.

Our model p-value is 0.0000, which means the logistic model is appropriate. Meanwhile, **Psuedo R2** is an analogue to the **R2** value in the SLR model that acts as an indicator of goodness-of-fit (although we do not interpret it in the exact same way). In general, we would be happy if we got Psuedo R2′s ranging between 0.2 – 0.4; the fact that we’re getting 0.5077 using just a two predictor model to explain 63 years worth of BP history indicates the model is pretty darn good.

The p-values for both total_noms and DGA are also very small, indicating their significance as predictors, so let’s move on to interpreting their coefficients. Our model estimates that each additional Oscar nomination a BP nominee receives *increases* its chances of winning BP (on average) by **27%**. Alternatively, we can also interpret this as each additional Oscar nomination a BP nominee obtains makes its winning BP **1.27 times** more likely. Meanwhile, the model estimates that if a BP nominee wins the DGA, the odds of winning BP are **50 times** more likely.

Here is a graph that depicts our model’s estimates of BP win probability when considering only total nominations:

where the blue band indicates the 95% confidence interval of the model’s probability estimates. As expected, more nominations equates with a higher chance of winning BP (though interestingly enough, there seems to be a drop in win probability going from 12-13 nominations!). Our model predicts (with 95% confidence) that if a BP nominee received 12 Oscar nominations, its chances of winning BP are between **56% and 72%**.

Let’s see how things change when we throw the DGA into the picture:

We now have two bands in our graph: the blue band depicts predicted BP win probability *without* winning the DGA, while the red band indicates win probability *with* a DGA win. As we can clearly see, the DGA is extremely influential in predicting BP, more influential than just total nominations. This time, the model predicts (with 95% confidence) that a film with 12 nominations that *did not* win the DGA has a BP win probability between 0.2 to 25%, and a BP win probability between **81 to 97%** *with* a DGA win – a **huge** difference.

So we now have a good idea of how number of nominations and DGA factor with predicting BP in general. But just how well does a model using only those two variables explain past outcomes? We can generate what is known as a **classification table** for this purpose after running a logistic regression.

There’s quite a bit going on here, so let’s zero in on two things. **Sensitivity** indicates the model’s ability to predict true positives within the data, while **Specificity** indicates the ability to predict true negatives. In other words, these are measures of how good our model is in picking out which films actually won BP and which films did not win. As the table indicates, our model correctly identifies **50/63 (79%)** of the BP winners from 1950-2012, and **253/265 (95%)** of the BP non-winners. Not too shabby for a model using only two predictors, although there’s always room for improvement.

And that, my friends, is the bare bones of logistic regression – the spinal column of AD’s Oscar forecasting models. Logistic regression helps quantify the conventional wisdom we long-time Oscar watchers have used in predicting winners. In this case, we now have some idea just how influential number of nominations and the results of the DGA are in the BP race in terms of odds and percents.

This episode will actually be the last one in this series that will get into such technical detail, so you all can rest easy In January before Oscar nomination morning, I will introduce one of AD’s models in its entirety (right now, I’m thinking I will unveil AD’s prediction model for Visual Effects). I will also reveal (to some extent) the modifications and adjustments I make when the models generate their predictions. Again, I emphasize that there is a lot of personal “art” that goes into statistical prediction in addition to the science, and the Oscars are certainly no exception to that.

If you are interested in the (many) details I left out regarding logistic regression, I invite you to post in the comments below. And as always, feel free to e-mail me at marshall(dot)flores(at)gmail(dot)com or converse with me on Twitter at @IPreferPi314.

Happy holidays!

My head hurts… Graphs and numbers. The bible didn’t say anything about this!

Graphs andnumbers. The bible didn’t say anything about this!Are you sure? I’m pretty certain there was an entire Old Testament book dedicated to numbers.

I’m really loving this series! It’s just geeky enough for me without going too far and losing me.

Don’t sugarcoat it Marshall – who’s gonna win this dart game?

I’m really loving this series! It’s just geeky enough for me without going too far and losing me.Much appreciated, thanks!

Don’t sugarcoat it Marshall – who’s gonna win this dart game?All in due time, Steve.

seriously, i just finished part one of a research methods and statistics course and the fact that you incorporated it into academy awards logic this is the most fascinating example i have seen of this material being used. did you use SPSS to calculate out the chart data? i will be learning more about it in part two of the course next semester.

Hi Jonathan, thanks! I actually use STATA for all my stats work, and its a fairly well-worn shoe for me now since I have taken two courses using it (introductory econometrics + experimental regression analysis). I know SPSS is a very prominent stats package as well, but I have no experience with it. My stats courses have all either used R or STATA.

This model really just reflects the high agreement between DGA win and Best Pic. Actually I think just going off the DGA win is slightly BETTER than this model, no? DGA win and Best Pic only disagreed 12 times since 1950, better than the 13 times this model misses…

BTW, here are the links to Episodes 1 and 2 of Statsgasm:

http://www.awardsdaily.com/blog/pilot-episode-of-awardsdailys-statsgasm/

http://www.awardsdaily.com/blog/statsgasm-week-two-regression/

Very astute observation, Bob. The driving force in this model really is the DGA. Keep in mind this is basically just a toy model to introduce logistic regression to everyone here. AD’s BP model uses 7 predictors – the DGA is still definitely the strongest predictor, but its not the only significant predictor for BP. The refined model does correctly select 30 of the past 30 BP winners.

This is a good time to segue into a couple of limitations of logistic regression. Wonk alert:

Logistic regression treats each observation independently when making estimates. So in the context of this example, the model doesn’t recognize that there are other nominees as well in a given year’s BP lineup, and that there can only be a single winner. (Well, I can’t really say that as theoretically there can be a tie. But it’s never happened before with BP, and I doubt it will ever happen).

As a result, the model could potentially have multiple nominees having predicted win probabilities of greater than 50%, and the classification table uses a cut off of 50% when evaluating the model’s predictive accuracy. So lets say the model predicted one BP nominee having a 99% chance of winning BP and another nominee having a 51% chance. The classification table would count both as predicted winners, but one would then be a false positive, which would then get factored into the table’s assessments on the model’s accuracy.

So again, this goes back to statistical prediction being art as well as science. I make my own tweaks here and there in formulating the models and evaluating the predictions they make. I will go over some of these tweaks in a few episodes from now.

This just proves why gravity has a very high chance to win because of the directors support behind cuaron and personally i finally want a scifi drama to win best picture.

DGA is a very important (most important) barometer and I believe also this season they will judge the BP winner.

Alan – you should root for “Her,” then. “Gravity” isn’t sci-fi. If it’s set in a period, it’s the past, because the shuttle program is now kaput.

i was thinking of what category it was i kinda knew gravity was a adventure/drama but everyones saying its scifi so that was the only reason i said that but yea i agree dga is the deciding factor and hopefully after that the tide will turn because gravity is one of a kind so here’s hoping for BP

makes sense, Marshall– thanks! I look forward to the next post

Gravity is a kind of sci-fi , high tech ,lost in space version of Redford’s ”all is lost ”, but it’s very audience accessible and a crowd pleaser ; unfortunately , I just don’t think that it will be an over 60 crowd pleaser …I suspect those folks have more mature tastes than that ; I just find it difficult to believe that the very same voters who liked the thoughtful , historical ”Kings Speech ” will like Gravity with it’s noise and painful dialogue ……but , I could very well be wrong