By Maksim Horowitz
The second annual March Plaidness Challenge has come to the end and it’s time to take a deeper look at the good, the bad, and the ugly from our challenge. First, everyone here at the CMU Tartan Sports Analytics Club would like to congratulate Andrew Willig for winning the challenge with 22 points! The pool of teams he chose combined for 22 wins, quite an impressive feat. Be sure to read below to see his team selections! Also, we would like to give a shout out to Suraj Vasishtha for taking second with 20 points, and our very own Mark Moswka for taking third overall!
With the formalities out of the way, let’s take a closer look at the numbers and find some insights from the data we collected from our challenge. First, let’s take a look at our top three participants and their picks that led them to victory:
|Andrew Willig - 1st||Suraj Vashishtha - 2nd||Mark Moskwa- 3rd|
|SMU||Wichita State||Ohio State|
We noticed that our first place finisher picked more teams than our second or third place finisher so we thought it would be interesting to take a deeper look at the distribution of the total teams picked and then see if, and how, it was related to total score. First, we analyzed the distribution of number of teams picked by individual participants. We used a histogram (Figure 1) to display this data with an overlying kernel density estimate, which smoothes out the distribution.
Figure 1: Distribution of Teams Picked by Participant
We see that our distribution is strongly right skewed, meaning the majority of the March Plaidness participants spent their budget on a smaller number of teams. This effect is most likely a consequence of the $100 cap restriction, in addition to the steep price for higher seeds. The idea here is that the more money you spend on higher and more expensive seeds, the less money you will have to spend on a greater quantity of low seed teams.
Next, we looked at the conditional distribution of participant scores, given the number of teams picked. Here, we used a basic linear regression to test if there was any significance in predicting total score from the number of teams picked. Our results were quite interesting. Contrary to what one might think, we found that there was a negative relationship between these variables. This means the more teams a person chose in their pool, the lower their expected score total would be. In context, however, this conclusion makes sense because the participants picking more teams are picking – by the tournament’s standards – lower quality teams. Lower quality (i.e. lower seeded) teams typically win fewer games, and thus contribute less to a person’s score. While picking an underdog is exciting, there is a fine line you need to walk when picking against the favorites, as evidenced by this year’s results. (Check out Giant Killers to learn more about picking you underdogs for next year)
Figure 2: Conditional Distribution of Total Wins Given Teams Picked
The next logical question to ask is, “How should upsets be picked?” Using just the data collected in this year’s challenge would yield an inaccurate answer, so we will hold off on doing any simulations. However, we can look at wins by “underdog teams” – on a basic level – for 2015. Looking by seeds, we see that only the 10, 11, and 14 seeds captured wins, while the 9, 12, 13, 15, and 16 seeds were all eliminated in the first round. So how did our participants do when guessing upsets? Below, we used a bar chart to visualize the proportion of total teams picked, split by seed. Figure 3 demonstrates that the 9, 14, 15, and 16 seeds were the least commonly chosen seeds among our participants. This lines up relatively well with the actual win totals of each of our seeds. The only seed that should have been picked more often was 14 as the 14 seeded teams combined for two total wins. We see that our participants misallocated some of their spending on 12 and 13 seeds; which combined for 0 wins, but each accounted for roughly 6% of the total selected teams. Obviously, seeding isn’t the only factor when determining which team will win a game; match-ups, coaching and experience are all key aspects as well.
Figure 3: Proportion of Times an Underdog Team was Picked
Price Values of Each Team
The next thing we were interested in was finding the value of each team in the bracket. We priced each seed based on their historical probability of winning the tournament (note: we charged $1 for seeds that have never won). In order to value the teams, we looked at them from a price per win standpoint. That is, we divided the cost of the team by their total number of wins – the lower the number, the more valuable the selection:
|Team||Price Per Win|
|12. Notre Dame||4.67|
|13. North Carolina||6|
|19. West Virginia||4.5|
|20. Northern Iowa||9|
|25. Michigan State||1.75|
|26. Wichita State||3.5|
|31. NC State||3|
|32. San Diego State||6|
|39. Ohio State||5|
|45. Boise State/Dayton||4|
|55. Georgia State||2|
(Teams with 0 wins were omitted as they provided no values to participants)
From the table above, we find that the most valuable teams in this year’s March Madness tournament were Michigan State, UCLA, Georgia State, UAB, NC State, Wichita State, Xavier, Louisville, Dayton and Duke. Furthermore, we also see that Duke, the champion of the tournament, was only the 10th most valuable team! Obviously, the winner of March Madness contributes 6 points to a participants total score, but in this year’s case, it was at the cost of $25, as Duke was a 1 seed. The winning score of our challenge was 22 points, meaning it was possible to win without picking a single number one seed!
Overall, our March Plaidness Challenge was a huge success and we hope the high level insights we demonstrated throughout this article will help influence your picks next year. As we gather more data from next year’s tournament, we hope to go into some more detail regarding optimal selections for our challenge. Thanks for participating and good luck next year!
 Note that we only had 86 participants, so all our conclusions must be taken with a grain of salt