top of page
Search

Film Industry Research

  • Nick Kish
  • Aug 15, 2020
  • 15 min read

Updated: Aug 20, 2020

Introduction

Welcome to Data is King, a blog series where I’ll attempt to uncover the path to success in the entertainment world using data. As an avid consumer of entertainment myself (aren’t we all?), I wanted to discover new insights about industries I’ve interacted with my whole life. I hope this blog series brings new, insightful information about entertainment as a business and an art form.

This post is about box office success and fan acclaim in the film industry. When I began my research, my goal was to harness data to understand what makes movies good (at least, according to IMDb review scores), what makes a splash at the box office, and what’s profitable. While these questions lie at the root of my findings, I gained a much more nuanced understanding of the industry along the way, and this post will share everything I learned. All the data came from an IMDb dataset on Kaggle. For those unfamiliar, IMDb stands for Internet Movie Database, and it’s a website with data and user-voted scores on practically every movie. Plenty of programming went into transforming the raw data (you can check out my code here), and I made all the visualizations in Tableau. Without further ado, let’s get into it!

The State of the Film Industry

To begin, let’s take a look at the film industry from a macro view. The focus will be on the United States only, primarily because we can accurately adjust for inflation on the US dollar. While I would love to expand the scope to the world stage, I’d need some pretty advanced math (and patience) to turn every country’s inflation rate and currency exchange rate over 100 years into a yearly multiple. On the bright side, by keeping the scope within the US, we can have a more focused dialogue on the country’s trends and attitudes with reliable numbers. I’ll touch on the implications of international revenue and much more in the closing observations.

The film industry at the box office has grown at a moderate pace in the US, to be kind. The vast majority of the growth depicted above occurred between 1980 and 2000, a period with an increase of around 44% for box office revenue adjusted for inflation, or 1.85% compounded annual growth. For reference, nominal GDP in the US (i.e., GDP adjusted for inflation) grew at a 6.59% compounded annual growth rate in that same timeframe. I’m sure we’re all too aware that ticket prices didn’t decrease, so either fewer people went to the movie theater over time, or the regular moviegoers went less often. It can be hard to blame them. I’ll even admit, some days I’d rather just watch Netflix (or Blockbuster if I were in the early 2000s) than drop my life savings on a small popcorn and a drink.

Despite the lackluster numbers, the industry still grew. While big blockbusters like Avengers: Endgame come to mind, the real growth driver was the number of movies released per year. In 1980, 119 movies released whose data qualified for my dataset (no null values for US income). In 2000, that number was 396, and it reached 510 by 2018. Simply put, there are way more movies released today than ever before. Every weekend is booked with just under ten new ones on average (among those that qualify).

However, while 114 more qualifying movies released in 2018 than in 2000, the total yearly box office revenue plateaued in that timeframe. Since the early 2000s, the industry’s growth in dollars and the annual inflation have canceled each other out. Box office revenue hit its all-time peak 18 years ago in 2002, a year which saw Lord of the Rings: The Two Towers, Harry Potter and the Chamber of Secrets, Spider-Man, and Star Wars: Episode II – Attack of the Clones.

More movies are released every year, but the total box office revenue has remained roughly stagnant for two decades. With some easy math, we know that today’s movies make less than those in the past on average. The average box office revenue per movie decreased by over 64% between 1980 and 2018, and it has continued to plummet while the industry’s total income has stagnated. Meager industry growth in the US hasn’t scared competitors. There were 415 distinct, active production companies in 2018. That’s nearly 150 more than the 266 in 2000 and almost six times more than the 71 in 1980.

Above is a snapshot of all those production companies and more, with each’s bubble size symbolizing their total revenue in US box office history. The whole pie amounts to $500B adjusted for inflation with the available data, and the production companies and movie theaters each take around a 50/50 split domestically. Among production companies, the top ten players dominate. They make up just over half of the whole market (and don’t forget that Twentieth Century Fox, Walt Disney Productions, Walt Disney Pictures, Touchstone Pictures, Pixar Animation Studios, Marvel Studios, and Lucasfilm are all owned by Disney). It’s safe to say the industry looks a bit consolidated, especially at the top of the food chain.

However, there have been plenty of bottom feeders over film history to clean up the scraps. Over 6,218 distinct production companies have lived and died selling tickets in the US. That’s a far cry from the mere 189 production companies that existed before and during 1960. Back then, the pie totaled $31.6B in today’s money. The top ten players made up around 80%, and Walt Disney Productions made up 39% alone.

What’s Hot?

So, we know that it’s harder than ever to succeed at the US box office. Production companies are clawing for smaller and smaller pieces of a stagnant pie, and they need every leg up they can get. What better way to help than to analyze some real, useful data? Let’s dive into the data and findings to discover what’s hot and what’s not at the box office.

We’ll begin with an overview of the US’s biggest earners. Above is all the US box office revenue made over the past 100 years, grouped by primary genre (the first genre listed on IMDb). Disclaimer: most movies have two to three genres listed, and the multiple linear regression models at the end of this section will take them into account. The genres film-noir, sport, history, music, and war were removed for having too little data as the primary genre (under 20 movies).

Action and comedy movies combine to make up 53% of all the money made, and it’s no surprise considering their quantity. There are 2,434 action movies and a whopping 4,530 comedies in my dataset. For reference, there are merely 77 thrillers. I’d say most people don’t experience too many car chases or gunfights on their average 9 to 5, so it makes sense that action is some highly sought-after post-work escapism. But we all know that movies can cover so much more. There seem to be eight core movie genres that lead the pack, ranging from action and comedy to crime and horror.

Over time, action movies have continued to widen their lead to historic levels. They made over $3B more than the second-place animation genre in 2018, and they weren’t even the leader until the mid-90s. Comedies used to reign supreme, but they’ve experienced a dramatic plunge of over 50% off their peak. Traditional slapstick comedies and satires may have trended to the wayside among audiences. However, many successful films still list comedy as a secondary genre, such as the action-comedy Deadpool or Pixar’s fantastic animation-comedies.

Speaking of animation, it has risen from the depths in the 80s to become a dominant player and the second biggest genre in 2018 (forgive me for bucketing all animation under one genre, I have restricted data I’m working with here). Animated movies dominate on a per-movie basis. Despite being the second-highest earner, animation doesn’t come close to its contemporaries in quantity (see below). For some more valuable insight, check out the average production company revenue and budget for each genre. I used only movies with no null values for either variable, so I had to remove family, sci-fi, western, musical, and romance for having too little qualifiers.

Animated movies rake in the most revenue, and I can see a couple of advantages in their favor. Everyone in the family can enjoy the Finding Nemos of the world. When going to the theater becomes a family-wide event, tickets per party increases to include everyone from toddlers to grandmas. And I can just imagine little kids dragging their reluctant parents to see the latest Trolls movie. It’s all too familiar. On the flip side, adult categories such as crime and thrillers underperform, likely achieving the opposite of what family movies do so well.

The budget numbers fall right around expected. Action and adventure movies are large scale by choice, and unless you’re hiring Dwayne Johnson, it’s probably cheaper to hire an actor than pay for dozens of animators’ salaries for an animated movie. The Polar Express, an animated movie that steered notoriously close to the uncanny valley, had a budget of $165M in 2004 (or an incredible $230M in today’s money). Thriller and horror movies tail on the low end, and these low costs can lead to quick profitability if they’re high earners.

Now, looking at the profitability picture for the US box office may be a bit discouraging. Besides horror movies, everyone is a loser. The average profit among all movies in the data isn’t actually a profit, but a loss of $9.8M. But of course, US box office revenue isn’t the full picture. International ticket sales, television rights, and licensing deals with the streaming giants of the world count too. And honestly, with how box office ticket sales are trending in America, it might be the strategy of many movies to look elsewhere, especially for action movies (I’ll touch on this later).

Now we’ve reached the really interesting part data-wise. Below are the multiple linear regression results on the natural logarithm of profit (the dependent variable) using all the genres (the independent variables). These include not only the primary genre but also the secondary and tertiary, making it more comprehensive. For those unfamiliar or who haven’t read my post on the video game industry, multiple linear regression works by modeling the relationship between the independent variables and the dependent variable by fitting a linear equation to the data. I log-transformed profit due to its power-law distribution (being heavily skewed to the right), normalizing the data points to better fit the linear model. We only care about the results of statistically significant variables, i.e., variables with a p-value less than 0.05, or those having higher than a 95% confidence level. The coefficients, or the linear equation slopes, show the amount of change in the dependent variable given whether each independent variable is true. It may be challenging to interpret the dependent variable’s coefficients, but I ranked all the statistically significant variables by their impact to show some insight. Finally, I had some concerning numbers arise in the Jarque-Bera test, a test that warns if the residuals’ distribution doesn’t follow the normal curve. The residual distribution ended up looking relatively normal, but I wanted to show the plot to be transparent about the results.

Rank Genre Coefficient

1 Horror 0.0308

2 Romance -0.0072

3 Crime -0.0111

4 Sci-Fi -0.0210

5 Animation -0.0214

6 Fantasy -0.0344

7 War -0.0360

8 Musical -0.0339

9 Adventure -0.0369

10 Action -0.0418

11 History -0.0426

The model only predicts 8.7% of the log of profit variation, but the coefficients are where the real insight lies. They’re almost all negative (remember that the average movie loses money), but horror managed to break through the ice as the lone positive. Both horror and romance movies do well with their modest budgets, and at 5th place, animated movies do a surprisingly good job covering their animators’ salaries. Action and adventure movies get portrayed in a pretty bad light once again. As for the history genre, I guess the average American has had enough exposure to it in the classroom to literally pay to watch it.

What’s Good?

Enough of looking at the revenue side of things for now; let’s take some time to appreciate film as an art form. In this section, we’ll analyze the internet’s taste in movies, highlight some of the most talented people in the industry for fun’s sake, and even try to figure out which position is the most influential artist on the film staff.

Above is a box plot of the IMDb user-voted score distribution by the primary genre, sorted by the median. The orange box’s upper and lower hinge represent the 75th and 25th percentile of the score, respectively, while the upper and lower whiskers represent the 97.7th and 0.3rd percentile. Unfortunately, there are far more outliers on the lower side of things, likely because there’s much more room to drop. To be fair, it’s probably way easier to make a terrible movie than an excellent one. I can probably go and make a terrible movie right now if I wanted.

It’s a running joke that most biopics are Oscar bait, so I can see why they’re at the top. Meanwhile, “genre” films such as horror, sci-fi, and fantasy, which have performed notoriously bad at the Academy Awards, don’t have a great case when it comes to the data either. But don’t feel too bad: every genre has its fair share of outliers in either direction. Movies like Alien, The Exorcist, and Get Out sit several standard deviations to the right among horror movies, and drama has the highest score (The Shawshank Redemption, 1994) and the lowest (Proud American, 2008). I also modeled the effect of primary, secondary, and tertiary genres on the average IMDb score using a similar multiple linear regression model, and you can see the results below. Once again, I ranked the statistically significant variables by the value of their coefficients.

Rank Genre Coefficient

1 Film-Noir 1.0300

2 Animation 0.3833

3 Drama 0.2963

4 Musical 0.2954

5 Biography 0.2648

6 War 0.2239

7 History 0.1744

8 Sci-Fi 0.1585

9 Adventure 0.0486

10 Action -0.0898

11 Romance -0.0928

12 Family -0.1583

13 Horror -0.3421

The more “sophisticated” genres, such as film-noir, drama, biography, war, and history, sit at the top, with animated movies and musicals sprinkled in for good measure. Interestingly, horror was the biggest winner when it came to profitability, but the biggest loser in the IMDb score. That’s funny when I think about it, because whenever a friend asks if I want to watch a terrible scary movie, I almost always say yes. I guess the data reveals some truth after all!

As a fun aside, above are the most common words in the movie descriptions in my data (minus common words like “the”). It seems as though “find” is the impetus for many plotlines, which is sensible when you think it through. Understandably, many movies also deal with life, and the inclusion of “family,” “friend,” and “father” show that relationships are a big part of storytelling.

For a little comparison, these are the most common words in movie descriptions for the top 5% revenue earners and the top 5% of IMDb scorers on the top and bottom, respectively. Among high earners, the most dramatic difference is the word “must,” showing a sense of urgency, I suppose. When I saw the words “world,” “team,” and “save,” I instantly thought about the Avengers movies (sorry Justice League!).

One weird finding: for the higher scorers, there’s clearly a larger focus on men than women. There’s no available data for this, but I wouldn’t be surprised if the vast majority of voters on IMDb are male, and there may be a significant gender bias. While obviously not true, it’s almost like every highly-rated movie is about a young man finding something in life. Another weird observation: for the top 5% scorers, the word “two” is huge. I could try to come up with some crazy hypotheses for why, but I’ll spare you.

For another fun aside, these are the top 20 actors, directors, and writers by average score (minimum ten movies as the top-billed actor, minimum five movies as the senior director or writer). Toshiro Mifune, one of the most revered Japanese actors of all time, takes the top spot for average score, starring in several samurai classics by Akira Kurosawa (the 7th ranked director). Christopher Nolan has a massive cult on IMDb, so it’s no surprise he’s the top director. What’s more impressive is that Quentin Tarantino, Stanley Kubrick, Akira Kurosawa, Hayao Miyazaki, and James Cameron all sit in the top 12 for both directors and writers, with Kubrick and Tarantino taking the top writing honors. They’re all esteemed filmmakers already, but I think this earns them even more respect.

The lists above use the same methodology but are grouped by total box office revenue in the US. Steven Spielberg is the biggest outlier in the data by a mile. In the normal distribution of money made by directors, Spielberg is 37 standard deviations to the right. Tom Hanks sits at the top of revenue for actors, and it helps that he’s been the top-billed actor in 37 movies (though not quite reaching Robert de Niro’s count at 48, Clint Eastwood at 42, or Nicolas Cage at 41). Many of the top writers aren’t even screenwriters by profession. Stephen King has written enough books adapted to film to be 12th on the list, and Bob Kane, the co-creator of the original Batman comics, is 18th. Other well-known novelists such as J.K. Rowling and J.R.R. Tolkien just missed out at 23rd and 27th place, respectively. I’ve also got to say that Sylvester Stallone earns massive respect for being both a top 20 actor and writer (with some of his writing credits being Rocky, Rocky II, Rocky III, Rocky IV, Rocky V, and Rocky Balboa).

Last but not least, these are the results of a multiple linear regression model on the average IMDb score using the average rating of the top-billed actor, director, and writer as the independent variables (denoted as a_avg_vote, d_avg_vote, and w_avg_vote, respectively). Using each actor, director, and writer’s average IMDb score as a loose proxy of their quality, I wanted to see which was most explanatory in the model. All were statistically significant, and the three variables alone explained 40.4% of the variation in the IMDb score. Just like I intuited, directors have the most significant impact on the movie’s quality. For every increase of 1 in the director’s average score, the movie’s IMDb score is expected to increase by around 0.4. The main actor has the least impact, and while a great lead can undoubtedly enhance a film, I think it’s far easier for a director to mitigate subpar acting than for an actor to mitigate subpar direction.

For some visualization of the model, you can check out the partial regression plot below. In general, the partial regression plot depicts a scatter plot of the independent variables’ residuals to the dependent variable to show their impact. However, there’s the added complexity of the effect of the other independent variables on the model. If you want to dig into the logic behind it, click the link here.

Closing Observations and Recommendations

The US box office is a red sea of cutthroat competition, and it’s only getting more intense. The industry’s domestic growth has been flat for nearly two decades, yet the number of competitors has continued to multiply. Now more than ever, production companies need to use data to understand industry trends to maximize their revenue and profitability.

Going to the theater is becoming more of a special occasion than a weekly occurrence (for most). Box office ticket revenue has become a smaller and smaller part of the US economy, being routinely outpaced by GDP. With Americans’ attention more divided than ever, it takes something special to get people to buy a ticket. Based on my analysis, my conclusions are quite simple: the most lucrative movies successfully get people out of the house and into the theater. There are multiple ways to achieve this, of course, and different productions have different costs to cover. From my findings, I found three distinct strategies to generate high revenues and achieve profitability at the box office:

1. To generate high revenues, make movies for the whole family. Animated films make more money for their production companies than any other genre on a per-movie basis (an average of $74M), and family movies actually average around $86M, albeit on a limited sample size as the primary genre. The animated category has grown 111% since 2000 in total revenue, more than any other style in the same timeframe. Unlike any other film type, family and animated movies can get large, diverse groups to the theater (either for family-wide get-togethers or parent-child outings), selling significantly more tickets per party. Some of the best animated movies appeal to everyone from 4 to 84-year-olds, and this wide-ranging target demographic is incredibly valuable. In addition, better than most genres, hit animated films can achieve additional revenue streams through merchandising and licensing for kid-friendly entertainment. While it takes a big budget to make something like Frozen or the family movie Aladdin, the potential payoff is worth it.

2. To achieve quick profitability, make low-budget horror and romance movies for teens and young adults. Horror movies arguably match animated movies’ success in motivating ticket sales, albeit with a much smaller, more targeted market. While 15-30-year-olds make up 40% of the average movie’s audience, they make up 60% in horror, and these particular 15-30-year-olds go to the theater 4% more frequently (findings reported by Variety Magazine). Scary movies offer a unique experience that warrants buying a ticket on a weekend night, especially among teens who want to scare their pants off (I’ve been there). And on that same weekend night, watching the latest romance movie can be a special occasion for all the couples out there. Both these genres hit their target markets well, and they come with manageable budgets of $15M on average for horror, and $14M on average for romance on a smaller sample size. Forget bigtime actors, directors, or even special effects; these movies can get away with the bare minimum and still have their target demographic reliably clawing for more. Making these kinds of movies is an excellent strategy for smaller production companies without the capacity to make big animated or action movies.

3. To capture the most market share overseas, make big-budget action and adventure movies. Action movies do a terrible job covering costs with US box office revenue alone, averaging a $25M loss. Despite these alarming numbers, there are still more and more releases every year. Why? They boom internationally. Action movies made up 45.8% of market share among genres in the US in 2018, but 55.1% worldwide. Action movies made a significant 63.3% of their profit overseas in 2018 (the total international ticket sales are an even more significant percentage, but international movie theaters take a larger cut of revenues). Ten years ago, that same metric was only 44.3%. Those numbers represent a 43% increase in international profit as a percentage of total profit, while all other genre experienced a 27% increase. Action movies are the largest growing genre in the most substantial growth source. Without adjusting for inflation, the global box office market has grown around 45% in the last decade, while the US box office increased approximately 18%. Compared to dialogue-heavy dramas, action movies have far less worry about language barriers. And quite simply, it’s what the people want. A worldwide hit could be the most prominent money-making option with a significant brand and international marketing strategy behind it.

Looking at the artsy side of things, if your production company wants to win awards and swoon the critics, you should invest in a stable of quality directors. Data supports the notion that directors have the most critical impact on a movie’s quality, and plenty of directors have relationships with talented actors to boot. “Sophisticated” genres like biographies, crime, drama, mystery, and film-noir appeal to the adults who not only intently analyze films, but also write reviews and give out awards. Despite missing out on many of these awards, animated movies resonate incredibly well with fans. I like to think it’s because of their limitless possibilities. After seeing your 10th crime movie or slasher film, it’s refreshing to see something you’ve never seen before. Spider-Man: Into the Spider-Verse has some of the most awe-inspiring visuals I’ve ever seen, and Zootopia manages to touch on racism and classism in unique, engaging ways that are impossible in any other form.


Well, that about wraps things up! I hope you enjoyed it, and if you did, check out my research on the video game industry. Thanks for reading!

 
 
 

Comentarios


Post: Blog2_Post
  • Facebook
  • Twitter
  • LinkedIn

©2020 by Data is King. Proudly created with Wix.com

bottom of page