At a time defined by tears, of both sadness and joy, March Madness reigns supreme as arguably the most intriguing month of the year for sports fans. Here at TCB Analytics, we are no different as not only are we huge sports fans but we take it a step further as data geeks, analyzing the numbers to see if we can decipher hidden reasons for winners and why Jayhawks fall early in the tourney (still bitter about that Northern Iowa). Each year Kaggle (and their new parent company Google) holds a competition to see who can predict the best bracket using statistics and algorithms. Along with this yearly competition comes the release of regular season statistics, along with each NCAA matchup, result, and game stats for the last 15 years that allow us the opportunity to answer the eternal question of your bracket…”What The Hell Went Wrong?”
There are two main groups that people fall into when making their brackets, the “I totally called that upset!” group and the “I knew there was no way that upset could happen” type of people. Both groups have their strategies, with the “upset” group picking almost every 12-5 upset they see, a 13-4, and just for good measure a 15-2 upset just in case it happens so that they look like a modern-day Nostradamus among their work March Madness Bracket Challenge. The latter group usually consists of picking little to no upsets, assuming the bracketologists that seed each team know way more than any outsider and should therefore be trusted, no questions asked. TCB is a completely third group of people…we say, based on historical data and what it takes for the average upset to take place, what teams are best suited to pull-off a David over Goliath bracket buster?
First, lets take a look at all of the upsets that have occurred in the opening round(this is where most upsets take place on a macro scale) and look at all 12, 13, 14 and 15 seed upsets over 5,4,3, and 2 seeds, respectively. Remember that class of bracket-pickers who always picked a 15 over a 2 seed? Well on average, they will be right once every two years, so you’ll have to deal with their ranting about every time you have to elect government officials (4/56 over the last 15 years) (Figure 1 – left).
While those who choose a 13 or 14 seed to win in the first round may have one correct every year (8/56 and 11/56 matchups). But those people who choose 12-5 seeds are correct approximately 40% of the time…that’s a pretty big amount (23/56 matchups). So what gives? Are the bracketologists wrong and always just putting good seeds at 12 or are they putting bad teams too high at a 5 seed? Or maybe it’s something different and it’s not a matter of good vs. bad but more a matter of style of play? We’ll argue it’s the style of play…and the data supports it.
First off who are the lower seeds? Historically lower seeds are mid-Major conference tourney winners who are thought to not stack up well against the likes of the ACC, Big Ten, Big 12, SEC, etc. who normally get the higher seeds. Higher seeds (1-6) are those major conference teams who win their conference tourneys or are good enough to get an easy at large bid. Their recruiting classes are usually top in the country year-in and year-out and have no trouble finding big men or high fliers – people who play at or above the rim. The lower seeds on the other hand don’t get these coveted 5-star recruits out of high school and therefore recruit lower valued big men and smaller outside guards who are often overlooked by the 6-foot-5 guard with handles who dominates high school competition by easily getting to the rim. One prime example, the prodigal-son, chef, and proverbial rainmaker – Stephen Curry.
Low on the recruiting scale coming out of high school, he was forced into a mid-major conference and proceeded to be overlooked due to the competition. When he was put up against “Major D1” schools in the tourney, he proceeded to torch them with 40+ point nights in prime time. Most “experts” think mid-major schools can’t compete with major conference teams because they don’t have the above the rim players or size and that above average win-loss records for a mid-major school are due to the inferior talent of other teams. This sometimes may be the case but in actuality they have formed their own style of play. Not dependent on getting to the rim, these smaller schools have developed game plans that reside outside the 3-point line: the probability for these teams of hitting 40% of 20-three’s for 24 points is higher than hitting 30% from within the paint on 40 shots resulting in the same 24 points. This differing offensive strategy, coupled with a more “team mentality” on defense makes the dependence on any one play very small, any member can step up and be replaced in most cases. Their skill set is different and that’s why these lower seeds often upset higher seeds, they drive janitors crazy by having to constantly replace the lights that are shot out of the gym.
Let’s look at the shooting percentage for winning teams compared to their regular season statistics. Compared to their regular season, during an upset the lower seeds make about 110% more 3-pointers on average, with 15-2 upsets making on average 120% more (Figure 2 – right)!
How do these stats compare to this years “big” upset of Middle Tennessee State (12 seed) taking down Minnesota (5 seed)? Well just as we expected, MTS shot 146% better from behind the arc compared to their regular season numbers. It looks like this is more than just a trend so next year when you have to ante up an Abraham Lincoln for that office March Madness Pool, look at the teams who shoot the 3 the best during the regular season, against all of their opponents. Disregard if they have any big “wins” on their resume and check out if they rely on their big men or their chef’s from the Promised Land and you’ll be king of your March Madness office pool in no time.