A new weekly column by Marshall Flores
This Oscar season, I am excited to announce the introduction of formal predictive models on Awards Daily. These models will attempt to forecast winners in 21 categories based on past data and trends. As far as I know, this will make Awards Daily the most prominent Oscars site to utilize statistical models for predicting the Academy Awards, providing an informed alternative to Nate Silver and his future work.
I won’t claim that statistics can tell everything we need to know about predicting the Oscars – as far as I’m concerned, predicting using stats is actually as much art as science, especially so with the Oscars, given that we’ll never have access to the best possible data for such an undertaking – AMPAS voter preferences. But I certainly do believe that statistics can tell an important part of the story. Statistics can provide empirical validation of conventional wisdom: specifically, which precursors and indicators are significant in predicting Oscar winners, and which are not. Through my analysis so far, I have found that many of the predicting rules-of-thumb we longtime Oscar watchers have grown up with over the years can actually be supported by statistical evidence.
Over the next few weeks I will review some fundamental statistical concepts and methods, with the purpose of building up enough fundamental knowledge to understand the models Awards Daily will be using, how they arrive at their predictions. I’m well aware that many of you may be math-averse and will balk at seeing a lot of symbols, terms, and graphs on a film site. But I will try my very best to keep what I write about accessible to a general audience, and will certainly give relevant examples in each post to illustrate the concepts I introduce.
To begin, let’s look at some data of Oscar Best Picture winners from 1980-2012: how many nominations each BP winner received, and how many Oscars each ultimately won. A table of the past 33 BP winners:
We can use something called a histogram to show how nominations and wins are distributed. A histogram is a common way to show the shape of data. It sums up frequencies, how often certain numerical outcomes occur. Visualizing data is an important first step before doing any type of formal statistical analysis – not only can they help verify that certain requirements are met before using certain statistical techniques, but they can also help in making hypotheses that we can later test.
As you can see, the histogram is comprised of a series of columns. The height of each column is the sum of how often something occurred. For example, in the past 33 years, there was only one BP winner in the past 33 years (The Departed) that received 5 total nominations, there were two BP winners (Crash, Ordinary People) that received 6, and so forth. We can see that the distribution has two peaks (bimodal), indicating the most frequent outcomes: there were 5 BP winners that received 8 nominations, and another 5 that received 11 nominations.More importantly, we can also see that the distribution of nominations is a little left/negatively skewed. This means that most of the data are concentrated on the right side of the graph, and the few outcomes that are on the left side are “dragging” (skewing) the overall shape of the distribution to the left a bit. Because of this shape, we can make the reasonable observation that BP winners tend to receive more nominations than less. This is an inference that can (and will) be tested in a future post.
Now let’s take a look at how wins are distributed among the past 33 BP winners.
Here, the distribution of wins is quite different from the distribution of nominations. For one thing, there’s only one peak (unimodal distribution) – 11 BP winners won a total of 4 Oscars total over the past 33 years, so it would appear that winning 4 Oscars is the most frequent outcome, occurring about one-third of the time. Since most of the data is now clustered to the left, we say that the distribution is right/positively skewed. As a result, we can infer that AMPAS tends to like spreading the wealth, and that big sweeps of the Return of the King or Slumdog Millionaire sort are rare.
That’s all for now. Next time I’ll go deeper into the statistical rabbit hole by introducing regression analysis
This post had no comments, so I decided to add one. 🙂
Thanks for the catch, Danny! Actually, I’m not sure why STATA omitted that gap when generating that histogram. Fortunately it doesn’t affect the right-skewedness of that graph.
Cool stuff, but I have to quibble with your second graph.
There needs to be gap between 9 and 11, as no BP winner since 1980 has won 10 Oscars, but there are two each for 9 and 11 wins.
Awesome. Looking forward
Thanks for your reply Z. I get what you’re saying regarding using large sample sizes and cherry-picking data. But in actuality, cherry-picking isn’t quite black and white bad science.
On one hand, if by cherry-picking you mean actively excluding/dismissing certain data because it doesn’t agree with your prior views, then yes, that’s bad form. However, in the context of regression analysis, model builders do try to operate under the principle of Occam’s Razor, that simpler explanations (and models) are preferable to complicated ones. Hence, cherry picking does happen in a sense, but it is performed using certain criteria.
Maximizing accuracy while ensuring simplicity is a juggling act that exemplifies how building predictive models is as much art as science. Using too many variables can result in the model being too calibrated to the data its using for estimation (“overfitting”), resulting in loss of predictive generality as well as a host of other statistical problems. Hence, we attempt to build optimal models using just the most statistically important variables, and there are various methods of performing this.
However, I will note that if a statistical test determines that a variable isn’t important statistically, it *doesn’t* necessarily mean that it’s not important, period. For example, the SAG Ensemble winner has historically not been a very good predictor of BP, and I don’t include it in AD’s BP model. But actors are the largest voting bloc in AMPAS, and Crash’s BP upset was foreshadowed by its SAG win. No one should totally ignore who wins the SAG Ensemble.
In addition, larger sample sizes are generally preferable, but again, there’s sometimes a bit of a juggling act here between being all-inclusive and recognizing the impact of developments. The Oscar game has changed many times throughout its history.
For example, the DGA has only been around since the 1950s, so any model created using regression analysis can’t really use pre-1950’s data if it wishes to incorporate the DGA as a predictor. Also, in many of my models, the BAFTA is a very important predictor; however, the BAFTA only became an Oscar precursor starting in 2000, so those models only end up using 13 years worth of data when generating their predictions.
Also, I want to add that the number of BP nominees seems like it has been a non-factor so far in predicting BP. In the four years since 2009 in which we’ve had at least 9 BP nominees, AMPAS has affirmed the guilds’ consensus choice every single time.
Spoiler alert (not really): the DGA. AD’s BP model estimates that winning the DGA makes the odds of winning BP about 165 times more likely. That being said, there isn’t a perfect correlation between DGA and the BP winner, which is why the DGA is just one of 7 variables the model uses in predicting BP.
I’m saying that the BP model AD will be using this year has correctly chosen every single BP winner from 1982 to 2012. So actually, it gets all 31 of the most recent BP winners. And yes, the BP model (like every other model I’ve made for this season) just uses precursor data available before the Oscars ceremony to make its estimates.
Took a few years of trial and error to get it right, but it works well enough for me. Not that it matters for anything — the algorithm could just as easily be wrong as it is right. I only took a hand at it to see if it were possible to just look at the Oscar nominations and nothing else and correctly predict the winner. Almost, but not quite.
The formula I use only takes into account the number and type of Oscar nominations a Best Picture nominee receives and nothing more — that leads to the 95% accuracy from the first ceremony to the 85th.
When I factor in other groups — DGA, SAG, PGA (though those latter two groups have less to go on) — and box office and Metacritic’s numbers, I get about 99% accuracy from the first ceremony to the 85th. It’s the first six ceremonies that muck everything up — but unlike some folks, I want my method to be all-inclusive. If I ignore pre-1934 or even pre-1944 nominations, what good is the algorithm? Sure, I can get 100% accuracy, but I’m cherry-picking the data, and only using the information that proves a point and ignoring anything counter is bad science.
In your predictive model what variable has the greatest impact on the BP win?
Are you saying that without knowing any Oscar results, i.e. just noms and previous wins in precursor awards (Globes, Critics, DGA, PGA, WGA) you can predict the BP winner 30 out of 30 times, regardless of number of BP nominees?
wow! this sounds really interesting. Congratulations Marshall and best of luck. Can’t wait to see what you math genius can come up with.
“Also, the way I would classify the various precursors: critics circles = try to influence the race, press awards (IPA, NBR, Globes, BFCA) = try to predict the race, Industry/Guild Awards = show us where the race is going.”
That sounds about right! Certainly matches with the results of my model building.
“Each year is its own “soup” where the conventional plays (mostly successfully) against more unique storytelling. To unlock a voting pattern of the AMPAS demographic in advance is going to be tricky.
Stats are 20/20 hindsight – once the vote is in, it’s easy to see “why” things went the way they did. Just like other elections, though, voter tastes are pretty unchangeable – what matters is whether or not their preferred style of films are strong enough to get them to bother voting for them. ”
Thanks for the comment, Steve. I totally agree. As I said in both the post and an earlier comment, statistical prediction is as much art as it is science, and the technique I’m using (which is the same one that Nate Silver uses, as well as other would-be Oscar stats oracles) is way better at explaining the past than the future. Coupled with the fact that we’ll never get be able to poll AMPAS to a significant degree, and we’re left with pretty much relying on the precursors as proxies for how they’re ultimately going to vote
(Yes, I’m aware that other prominent Oscar bloggers/journalists do talk with voters and report their preferences, but you can’t really conclude that what a half-dozen voters (who self-select to be interviewed) say under the cloak of anonymity will be representative of the 6000+ members of AMPAS as a whole).
There are a plethora of both theoretical and real-world minefields to navigate in attempting to build prediction models for the Oscars. AMPAS demographics and tastes certainly change. Small sample sizes can lead to guesses that are *way* off. Random events like a date change (e.g. the DGA announcements) can create all types of chaos.
Still, group behavior of any kind is, to an extent, repetitive – AMPAS is no exception. In some cases it’s very easy to figure out which precursors are the best signposts for how AMPAS will behave. In other cases, especially in the techs, it’s much harder to parse the data and see if there’s anything useful for prediction.
In the end, anyone who knows how to data mine and use stats software can build prediction models for the Oscars. But being able to appreciate and understand how the great Oscar game goes down every season, having savvy, passion, and intuition – that’s a game-changer in its own right.
Hopefully AD’s models represent the best of both worlds, and that my attempt in crystallizing what I’ve learned here in mathematical form (especially from our patron saint) is enough to build the best possible mousetrap for Oscar.
We’ll see how it pans out in March!
You’re a brave man, Marshall, and I’m looking forward to seeing what you’ve come up with.
Each year is its own “soup” where the conventional plays (mostly successfully) against more unique storytelling. To unlock a voting pattern of the AMPAS demographic in advance is going to be tricky.
Stats are 20/20 hindsight – once the vote is in, it’s easy to see “why” things went the way they did. Just like other elections, though, voter tastes are pretty unchangeable – what matters is whether or not their preferred style of films are strong enough to get them to bother voting for them.
There’s always the bloc that votes Tree of Life/Serious Man, the bloc that loves a good cry, the bloc dazzled by big production values, and the bloc that feels by liking something they are making a statement.
I’m guessing that only when conventional likeability is weak do flms like The Hurt Locker or No Country win. It’s pretty much a stacked deck, demographically, that relies on getting the vote out.
Unlock this, and you could become the Nate Silver of Passion Prediction – this will be fun to watch!
Hey, this is a good and welcome idea. Looking forward to it. I’m a self-proclaimed skeptic when it comes to using stats as a tool for explaining what goes on in individual voters’ mind in the present, but I’m sure it will be the catalyst for a lot of interesting and potentially eye-opening discussions:)
Oh shit, I’m so jealous 😉 was preparing to do the same with me own much less sophisticated model…
So far, all I found ot based on nominees for the 5 major industry awards – AFI, SAG (ensemble), PGA, DGA, BAFTA – is that any film that gets at least 4 or 5 nods out of 5 is a Lock for a BP nom at the Oscars, any film that gets 3 nods has strong chances but could still miss out, below 3 nods chances are much weaker but a zero-nod outsider like last year’s Amour can still sneak in.
To win BP, a film needs to be among the aforementioned Lock category, which usually leaves us with only 5 or less real BP contenders. Hence a legitimate question: why does the Academy needs more than 5 films vying for BP if the additional nominees have no chance of winning?
I didn’t count WGA, bc many oscar players are not eligible there and thus their results are 50-50.
Also, the way I would classify the various precursors: critics circles = try to influence the race, press awards (IPA, NBR, Globes, BFCA) = try to predict the race, Industry/Guild Awards = show us where the race is going.
Anyway, congrats for this new and shiny column, can’t wait for the results!
Appreciate it Antoinette. 🙂 Definitely will do my best to make them succeed where others have been hit-or-miss. Again, I think the X-Factor is them being blessed by all that is AD.
“Drop the ‘The.’ Just Statsgasm. It’s cleaner.”
Seriously, what’s with the chicken? 🙂
I still prefer guessing, but good luck, Marshall. We’re all counting on you. 🙂
Thanks Dane! Honestly, I think I’ve ended up just building a more educated mouse trap, more educated only because of all the wonderful insight I’ve learned via osmosis from the 7 years I’ve been here at OscarWatch/AwardsDaily. It remains to be seen if the work and the insight behind all the fancy math will measure up. But it’s undoubtedly exciting to see how the models will perform not only against the other Oscar pundits, but also Nate Silver and others who are better versed at advanced stats than I am.
“Normally I’m not one to have school spirit, but ASU represent!”
Yeah, I’m admittedly not one for school spirit either. But my current Gravatar pic is the only one I’ll ever use, haha. I’m camera shy, I’m afraid.
Not to blow my own horn…
but at first this series was to be called ‘The Statsgasm’
I said, “Drop the ‘The.’ Just Statsgasm. It’s cleaner.”
so. there you have it.
Normally I’m not one to have school spirit, but ASU represent!
Finally, it’s happening! Marshall is da man too!
Your site just became that much more essential, Sasha.
Finally, it’s happening! Marshall is da man too!
Your site just became that much more essential, Sasha.
I love this. Thanks for your enthusiasm, DaneM. I feel exactly the same way.
Oh okay, I see what you’re saying Al.
I do keep track of the results of the “big four” critics awards (NYFCC, NBR, LAFCA, NSFC) and they do feature in a few of the models. But I actually don’t keep track of how many total critics wins/nominations each prospective nominee obtains. Just compiling and verifying 13 years worth of data was enough a time-consuming task as it is, especially for someone who’s not all that great at data mining like me. :-/
In general, I’ve found that critics awards aren’t all that significant in predicting Oscar winners, confirming the general CW that “critics don’t vote for Oscar.” There are some categories in which they are significant, but in most of those cases there are stronger guild/BAFTA-related predictors out there.
(But there is one exception: I will give a minor spoiler alert and say that the NBR has been the strongest predictor of a winner in one particular category since 2004. Won’t say that the film is a lock as that it will be foolhardy in what has historically been a very competitive category with great lineups year after year. But I will admit it makes me feel elated as it is a great, great film and would be a very worthy winner.)
One thing to keep in mind that these 21 models will only predict winners, not nominees. I may tackle predicting nominees next year, but I’m not all that confident that statistics will be that useful, especially since there’s always things like ceremony/nomination announcement date changes, e.g. last year with the DGA.
Okay, I will wait to see what you write.
But, to further explain my question, since I wasn’t all that explanatory in it. What I was asking is whether or not pre-Oscar awards wins have more value on what the Oscar voters nominate and vote for, than the pre-Oscar nominations.
For instance, American Hustle has won Best Picture already, but only nominated twice, where 12 Years a Slave has been nominated for Best Picture 3 times, but has not won (at least yet anyway).
What is more valuable, American Hustle’s 1 win but only 2 nominations, or 12 Years a Slave’s 0 wins, but 3 nominations?
“I’ve developed an algorithm that successfully predicts the Best Picture winner about 95% of the time.”
The current BP model that will be utilized this year at AD correctly predicts all 30 of the past 30 BP winners. Including Crash over Brokeback, Braveheart over Apollo 13, Shakespeare in Love vs. Saving Private Ryan, etc. 🙂 #humblebrag
That shameless plug aside, any stats wonk worth their salt will admit a limitation of regression analysis is that it is far better in predicting/matching the outcomes in past data than predicting new ones. But out of the 21 models, I will definitely say I am most confident in the BP model.
Anyways, feel free to e-mail me your algorithm, Z at marshall(dot)flores(at)gmail(dot)com
“Wins or Nominations? Or are they equal?”
Well it depends by what you we’re predicting, and what you mean by wins. 🙂 I will say that in some categories, yes, a nominee’s previous wins are a significant factor.
Don’t want to spill too much of the beans, but I will be exploring number of nominations in a few contexts over the next couple of weeks.
What stat is more important, or should I say an accurate indicator:
Wins
or
Nominations
?
Or are they equal?
I’ve developed an algorithm that successfully predicts the Best Picture winner about 95% of the time.
It doesn’t help with any other categories, but I’ve only been wrong on Best Picture once in the past 14 years (I stuck to my methods instead of riding the “Argo” wave).
I love it. Can’t wait for the rest Marshall!
Fantastic, looking forward your wizardry!
I LOVE this idea!! Thank you (in advance) Marshall. 🙂
I too love to use statistics to assess the value of something, and to predict what will happen.
Hi Bryce. I believe Rob Y still does the primary ballot simulation in which AD readers get to vote.
But I am planning to run a different analysis of the voting system using the end-of-the-year Top 10 lists that are starting to come out from critics and bloggers. I’m compiling the data as each new list comes out – right now I have 19 “votes”). Personally, I don’t think there’s such a thing as too many articles attempting to explain the nuts and bolts of the Oscar voting system, haha.
Swell
Are you going to be doing the Oscar ballot simulation as well?