Linear Programming for an NFL Daily Fantasy Optimizer
The NFL season is back in full swing. If you’re a sports and data nerd like myself, you’re ready to flex some of your analytical skills to try and beat all your friends in your fantasy leagues. Or you can take a crack at Daily Fantasy, which involves picking players and forming a team based on salary and position constraints. Fortunately, this type of problem is a perfect fit for linear programming since we are looking to maximize total fantasy points.
The first step in any data analysis problem is to get the data. We pull data from a variety of sources including:
- NFL schedule data – to determine the games being played and home/away teams
- Vegas Over/Under and Spread data – to determine high scoring games and favorites/underdogs
- Fantasy Points Against – which teams are giving up the most points to QBs? RBs?
- Projections – these come from NumberFire and are used to predict who will produce the most points. We’ve also created our own projections in the past based on some of our own football knowledge.
Once all of that data is scraped and combined together using several R scripts, we have the data we need. Since we’re hosting the application on shinyapps.io and updating the data on a regular basis, we need to store that data somewhere. We upload the projections to an S3 bucket on Amazon Web Services and the Shiny dashboard reads the data directly from there. Whenever the data is updated, the Shiny app reflects those changes.
Now we leverage the lpSolve R package to do the heavy mathematical lifting for us. As with most tasks in R, it’s only a few lines of code to get our optimized fantasy lineup.
The problem we’re trying to solve is as follows:
Given that we have $60,000 to spend on players and we can only have 1 QB, 1 DEF, [3 RBs, 2 TEs, 4 WRs] (due to the new Flex position which can b a TE, RB or WR), as well as no more than 4 players from any given team, how do we maximize fantasy points produced?
First we have to define our constraint matrix, which consists of one row per constraint and one column per variable. In our case, a variable is a player and each row defines their team or position as well as salary.
con <- rbind(t(model.matrix(~ position + 0,preds)), t(model.matrix(~ team + 0, preds)), rep(1,nrow(preds)), preds$salary)
Next, we have to specify constraints. This is where we specify in the
rhs variable: 1 QB, 1 DEF, [3 RB, 2 TE, 4 WR] . Then we specify no more than 4 players from a single team, exactly 9 players total and $60,000 salary.
rhs <- c(1,1,3,2,4,rep(4,length(unique(preds$team))),9,60000)
The next line specifies the direction of those constraints so for the
dir variable it’s = for 1 QB, = for 1 DEF, <= 3 RBs, <= 2 TEs, and <= 4 WRs, <= 4 players per team, exactly 9 players and <= $60,000.
dir <- c("=","=","<=","<=","<=", rep('<=',length(unique(preds$team))),"=","<=")
Lastly, we feed the constraint matrix (
con), the numeric values for those constraints (
rhs), the direction of the constraints (
dir) and our objective function that we’re trying to maxmize, which are the fantasy projection numbers (
obj) to the
lpfunction. We specify “max” since we’re trying to maximize fantasy points. You could imagine scenarios in which this might be “min” if you’re trying to minimize costs or risk.
result <- lp("max", obj, con, dir, rhs)
This results in an optimized lineup of players that we should pick. However, we may want to apply our own filters and optimize again. For example, we can filter on games that should be high scoring given the Vegas lines. Or we can only pick from teams that are favorites to win. We could also remove players that we don’t feel will perform up to their expectations by highlighting the row and clicking the Delete button below. This is where using Shiny becomes extremely useful to experiment with different filters and lineups. Give it a try and feel free to comment and provide us with feedback!