2021 NCAA Tournament ‘Cinderella’ Model: The Formula for an Upset and 2021’s Matches

John McCoy/Getty Images. Pictured: Amadou Sow (12) of the UC Santa Barbara Gauchos men’s basketball team.

Two years ago — when last we crowned a NCAA men’s basketball champion — I spent the entire month of February building a mid-major Cinderella Model. I started from the ground-up, throwing aside my preconceptions and biases as best I could, and I approached the task the most painstaking way I knew how: Exploratory Factor-Analysis (EFA).

Hold on. Don’t click away, yet. Stick with me here. I promise not to bore you with statistical jargon like “EFA.”

Well … ‘promise’ is a strong word.

I … vow? Nope; that’s worse.

I’m going to … try my best. (Nailed it.)

I analyzed every NCAA Tournament team since the 2001-02 season based on every single KenPom metric available. Through statistical treatment, I determined which metrics matter and which ones don’t. And, you might be surprised to learn that one important factor (cough team experience cough) doesn’t matter at all.

I then built a model that predicts the types of mid-majors that win in the first round — and which ones tend to lose. Finally, I used that model to rank this season’s mid-major squads based on each team’s probability of scoring a first-round upset.

For you TL;DR folks out there who want to get to my Cinderella Rankings as fast as possible, feel free to skim to the bottom of this article. Or, use the highlighted text throughout as a summary. The Action Network editorial team did go through the effort of adding all these highlights, so you might as well use ’em, right?.

NCAA Tournament Promos: Bet $20, Win $300 on ANY Moneyline, More!Read now

Defining a Cinderella Team

Is being a Cinderella about the colossal upset, or the deep tournament run?

Maybe it’s both.

But those deep Final Four runs aren’t exactly predictable — and I want to provide you with something that has meaningful predictive value as you fill out your brackets this season. So, when I say “Cinderella Teams,” I’m focusing on squads that can pull a first-round upset this year.

By focusing on obscure mid- and low-major schools with a chance to pull a big upset, I’m also implicitly highlighting high-profile, lower seeds with a real chance of losing on Day 1. These are the kinds of teams you want to avoid taking deep into the tournament, lest your bracket be busted in the first weekend of play.

I am not trying to find every single possible upset in the first round. I am not trying to identify every team that could make a Sweet 16 run. Instead, I’m trying to identify the teams that no one is thinking about that have a strong chance of being upset in the first round — thereby busting everyone else’s brackets … expect yours (if you take my advice).

Rules & Requirements for “Cinderella” Status

Let’s define what constitutes a Cinderella team as specifically and operationally as possible:

10-seed or higher.
16-seeds are excluded (sorry UMBC, but that’s not happening again). Since 2001-02, 16-seeds are 1-72 in the NCAA tournament. If I included them in my statistical analysis, their poor metrics would throw off our sample.
The team cannot come from a Power-6 conference (ACC, Big 12, Big East, Big Ten, Pac-12 or SEC).
The team cannot be ranked entering the NCAA Tournament. This stipulation gets rid of past Gonzaga and Wichita State teams that were criminally under-seeded despite their season-long excellence. This year, it also excludes teams like Houston, Loyola Chicago, San Diego State and Gonzaga (Big Shocker there. Get it? Shocker? Like … Wichita State … from the previous sentence? Ah, never mind.)
The team cannot be ranked in the AP top-15 in January, February or March of the given season. This stipulation ensures that the team is largely unknown to the public.

After filtering all tournament teams since 2001-02, there are 339 schools that fit the parameters listed above. Of those 339 teams, 70 of them won their first-round game. That equates to a win percentage of 20.6%. Let’s break that down by seed:

A Short Lesson: Don’t Be Like Me (At Least, the 2019 Edition)

I’m now directly quoting the prose from my article two seasons ago, and adding my 2021 commentary via highlighted text:

After filtering all these teams until I was satisfied that I was capturing the right kind of team, I then recorded every team’s pre-tournament KenPom metrics and ranks. I manually recorded all 43 of KenPom’s metrics — and team rankings for each of those metrics — for all 314 teams in our sample. That’s 27,004 data points. By hand. And yes, Excel crashed multiple times on me.

Update: What I did not know then — and what I do know now — is rudimentary coding language to enable me to scrape this data. I know it’s a meme, but take it from me: Learning to code … kinda pays off. Just sayin’.

But I did it. I pressed on for you folks, because I care. Maybe I’ve watched too many Jon Bois videos on YouTube. Maybe I really need to get a dog. Either way, you’re welcome.

Update: I no longer binge-watch Jon Bois videos on YouTube, and I now own a dog: A Siberian Husky whose name is Ash Ketchum. You are still welcome.

Why Did I Do This to Myself?

So, why did I individually log 27,004 data points? What was the purpose of that suffering?

Update: Through years of therapy, I have learned that this has something to do with being a first-born child. That’s as far as I’ve gotten.

By way of answering, let’s return to base here for a second and remember our goal. We’re trying to identify the statistical profile of low- and mid-major teams that win their first-round games. So, we need to weed out all the noise that doesn’t differentiate these schools and instead focus on only the core metrics that matter the most in discriminating winners from losers.

I’ll spare you the details on how I did that — it involves something called an ANOVA test and other complicated methods that I won’t bore you with. Let’s speed along to the results.

(You’re almost there. I believe in you!)

Win $5,000 in Our FREE Madness Contest

Free to enter

$5,000 in prizes

Contest closes on Friday at 12pm ET

ENTER IN OUR FREE APP

Metrics That Matter

After analyzing each and every KenPom metric, my tests revealed just seven that meaningfully discriminate winners from losers. Just seven … out of 86. They are as follows (definitions taken from KenPom.com):

AdjO: Adjusted offensive efficiency — an estimate of the offensive efficiency (points scored per 100 possessions) a team would have against the average D-1 defense.

AdjD: Adjusted defensive efficiency — an estimate of the defensive efficiency (points allowed per 100 possessions) a team would have against the average D-1 offense.

AdjEM: The difference between a team’s offensive and defensive efficiency.

Defensive eFG%: Effective Field Goal Percentage (eFG%) allowed to the opposing offense.

Offensive Turnover %: Offensive turnovers per possession.

Defensive Turnover %: Opponent turnovers forced per possession.

3P% Defense: Three-point percentage allowed to opposing teams.

These seven metrics combine to paint a logical and intuitive portrait of a potential Cinderella team. Generally, teams that upset top seeds in the first round boast well-rounded offensive and defensive efficiency, do not turn the ball over often on offense, force turnovers on defense, and defend well on the perimeter.

**One Metric That Doesn’t Matter**

There are literally 79 statistics that don’t matter (based on our results), but I’m not going to report all 79 of them here.

I will, however, highlight one particular metric in order to dispel a false narrative about Cinderella teams: Team Experience does not matter.

Sports media loves to talk up experienced, senior-laden teams that pull first-round upsets, but the data suggests that team experience has zero statistical effect on a team’s chance to pull off an improbable win. The experience narrative is just that. A narrative. It pulls at the heartstrings, but it isn’t grounded in historical fact. It should be dismissed as a predictive tool.

This Year’s Potential Cinderella Teams

To find 2021’s Cinderellas, I built a model using scary math things like multivariate logistic regressions and coefficients … I’ve whittled that down to 17 teams that fit our requirements for a mid-major Cinderella.

Here’s what you need to know about the results in the table below:

The higher the probability coefficient (labeled in the chart below as “p-Coefficient”), the better chance the model gives a team of pulling an upset.
That data point informs the historical W-L column, which shows the tourney results for past teams with equal or better probability coefficients.
The win % column is simply the percentage attached to the historical W-L.

Below is a ranking of each Cinderella team’s chances to win from best to worst — and the higher seeds most at risk of getting upset:

2021 NCAA Tournament Cinderella Rankings

Top Seeds Most In Danger of an Upset

Dominate the Madness: Get 80% OFF

See who the pros are betting in March

Projections for every tourney game

Access to 4 winning NCAAB systems

GET 80% OFF