What a diff'rence xG makes

One argument in favour of using xG in football analytics and punditry is that it gives a better idea of which teams are good and which teams are not. Supposedly, xG allows us to cut through some of the randomness of goals and get closer to seeing teams’ true strengths. I find this view pretty intuitive; however, intuition alone is not enough to make the argument.

In this post, I present an analysis evaluating this claim. By comparing a team strength model using goals with one that uses xG, I have attempted to estimate how much better xG makes our understanding of team’s abilities. The code required to run the analysis is embedded in the post.

The team strength models will be the “Vanilla” and “xG” models described in the previous post.

Preparing the data

The data preparation will follow the same steps as before. I’ve put the data required for this analysis up onto github, which you can download via the link below.

We can read in the data directly from the shortlink. Nesting the data allows us to keep the data tidy so that each row contains a single Premier League game, including a dataframe of all the shots in that game (contained in the shots column).

library(tidyverse)

games <-
read_csv("https://git.io/fNmRy") %>%
nest(side, xg, .key = "shots")

head(games)
## # A tibble: 6 x 9
##   match_id date                home  away  hgoals agoals league season
##      <int> <dttm>              <chr> <chr>  <int>  <int> <chr>   <int>
## 1     4749 2014-08-16 12:45:00 Manc… Swan…      1      2 EPL      2014
## 2     4750 2014-08-16 15:00:00 Leic… Ever…      2      2 EPL      2014
## 3     4751 2014-08-16 15:00:00 Quee… Hull       0      1 EPL      2014
## 4     4752 2014-08-16 15:00:00 Stoke Asto…      0      1 EPL      2014
## 5     4753 2014-08-16 15:00:00 West… Sund…      2      2 EPL      2014
## 6     4754 2014-08-16 15:00:00 West… Tott…      0      1 EPL      2014
## # ... with 1 more variable: shots <list>

To get the information from the shots’ xGs into the Dixon-Coles model, I’m using the approach established in the previous post. This means using individual shot xG values to estimate the probability of different scorelines occurring. Each scoreline can then be fed into the model as if it were an individual game, with less likely scorelines being weighted less.

add_if_missing <- function(data, col, fill = 0.0) {
# Add column if not found in a dataframe
# We need this in cases where a team has 0 shots (!)
if (!(col %in% colnames(data))) {
data[, col] <- fill
}
data
}

team_goal_probs <- function(xgs, side) {
# Find P(Goals=G) from a set of xGs by the
# poisson-binomial distribution
# Use tidyeval to prefix column names with
# the team's side ("h"ome or "a"way)
tibble(!!str_c(side, "goals") := 0:length(xgs),
!!str_c(side, "prob") := poisbinom::dpoisbinom(0:length(xgs), xgs))
}

simulate_game <- function(shot_xgs) {
shot_xgs %>%
split(.$side) %>%
imap(~ team_goal_probs(.x$xg, .y)) %>%
reduce(crossing) %>%
# If there are no shots, give that team a
# 1.0 chance of scoring 0 goals
add_if_missing("hgoals", 0) %>%
add_if_missing("hprob", 1) %>%
add_if_missing("agoals", 0) %>%
add_if_missing("aprob", 1) %>%
mutate(prob = hprob * aprob) %>%
select(hgoals, agoals, prob)
}

simulated_games <-
games %>%
mutate(simulated_probabilities = map(shots, simulate_game)) %>%
select(match_id, home, away, simulated_probabilities) %>%
unnest() %>%
filter(prob > 0.01) # Keep the number of rows vaguely reasonable

head(simulated_games)
## # A tibble: 6 x 6
##   match_id home              away    hgoals agoals   prob
##      <int> <chr>             <chr>    <int>  <dbl>  <dbl>
## 1     4749 Manchester United Swansea      0      0 0.165
## 2     4749 Manchester United Swansea      1      0 0.335
## 3     4749 Manchester United Swansea      2      0 0.184
## 4     4749 Manchester United Swansea      3      0 0.0516
## 5     4749 Manchester United Swansea      0      1 0.0508
## 6     4749 Manchester United Swansea      1      1 0.103

Comparing the models

With the data required to fit both the Goals and xG models, we can get to work comparing them.

We can compare the Goals and xG models with a backtest. This simply means testing how well each model would have predicted games in the past. In other words, using only information available at that time, how well does the model perform over our historical data.

Overall method details:

  • For each game…
    • … find all Premier League games within the last year
    • … fit each model on those preceding games
    • … make a prediction for that game

No teams play twice in one same day. So we can actually fit the models for each day, rather than for each game. This has the advantage of being quicker to run but is functionally equivalent to going game-by-game.

Find previous games

First, let’s find the previous games within a year of each game. We’ll use these games to fit a model as if we were at that point in time.

I’ve chosen a window of 1 year in the past for the models to fit on. This is a somewhat arbitrary choice; it seems likely that the historical window can be tweaked to improve the performance of the models. In other words, the models may perform better when fitted on games within 270 days of the last fixture, rather than 365.

However, that is a slightly different analysis. I also suspect that the optimal time window for the xG-based model and the Goals model will be different.

find_preceding_games <- function(game_date,
all_games = games,
period = lubridate::years(1)) {
all_games %>%
filter(date < game_date,
date > (game_date - period)) %>%
select(match_id) %>%
mutate(game_date = game_date)
}

window_length <- lubridate::years(1)

match_lookup <-
games$date %>%
lubridate::as_date() %>%
unique() %>%
map_dfr(find_preceding_games, period = window_length) %>%
group_by(game_date) %>%
summarise(matches = list(match_id)) %>%
ungroup() %>%
filter(game_date > (min(game_date) + window_length))

head(match_lookup)
## # A tibble: 6 x 2
##   game_date  matches
##   <date>     <list>
## 1 2015-08-22 <int [390]>
## 2 2015-08-23 <int [396]>
## 3 2015-08-24 <int [393]>
## 4 2015-08-29 <int [390]>
## 5 2015-08-30 <int [398]>
## 6 2015-09-12 <int [390]>

Fitting each model

For each date, fit a few different models:

  • Dixon-Coles
    • Vanilla Dixon-Coles model using only goals to estimate team strength
  • Dixon-Coles xG
    • Dixon-Coles model using xG values (via simulation)
library(regista)

fit_model <- function(match_ids, weights, all_games = games) {
all_games %>%
factor_teams(c("home", "away")) %>%
filter(match_id %in% match_ids) %>%
dixoncoles(
hgoal = hgoals,
agoal = agoals,
hteam = home,
ateam = away,
weights = !!enquo(weights),
data = .
)
}

transplant_param <- function(model1, model2) {
model2$par["rho"] <- model1$par["rho"]
model2
}

models <-
match_lookup %>%
mutate(
# Use non-syntactic names in anticipation of `gather`
`Dixon-Coles` = map(matches, fit_model, weights = 1),
`Dixon-Coles xG` = map(matches, fit_model, prob, simulated_games)
) %>%
gather(model, fitted, -game_date, -matches)

head(models)
## # A tibble: 6 x 4
##   game_date  matches     model       fitted
##   <date>     <list>      <chr>       <list>
## 1 2015-08-22 <int [390]> Dixon-Coles <S3: dixoncoles>
## 2 2015-08-23 <int [396]> Dixon-Coles <S3: dixoncoles>
## 3 2015-08-24 <int [393]> Dixon-Coles <S3: dixoncoles>
## 4 2015-08-29 <int [390]> Dixon-Coles <S3: dixoncoles>
## 5 2015-08-30 <int [398]> Dixon-Coles <S3: dixoncoles>
## 6 2015-09-12 <int [390]> Dixon-Coles <S3: dixoncoles>

Making predictions

Make predictions for each date with each model. While there are different types of predictions we could make about a football match, I’m sticking to outcome (Home/Draw/Away).

This is by no means the best way to evaluate a soccer model; however it has a couple of advantages here. One is that it’s relatively easy to understand. Another is that public H/D/A predictions and closing odds are available online, which makes the model predictions easier to benchmark.

model_predictions <-
models %>%
mutate(predictions = map2(fitted, game_date, function(f, d) {
newdata <-
games %>%
factor_teams(c("home", "away")) %>%
filter(lubridate::as_date(date) == d) %>%
mutate(prob = 1)

newdata %>%
mutate(pred = map(predict(f, newdata, type = "scorelines"), scorelines_to_outcomes)) %>%
select(match_id, pred) %>%
unnest()
}))

head(model_predictions)
## # A tibble: 6 x 5
##   game_date  matches     model       fitted           predictions
##   <date>     <list>      <chr>       <list>           <list>
## 1 2015-08-22 <int [390]> Dixon-Coles <S3: dixoncoles> <tibble [18 × 3]>
## 2 2015-08-23 <int [396]> Dixon-Coles <S3: dixoncoles> <tibble [9 × 3]>
## 3 2015-08-24 <int [393]> Dixon-Coles <S3: dixoncoles> <tibble [3 × 3]>
## 4 2015-08-29 <int [390]> Dixon-Coles <S3: dixoncoles> <tibble [24 × 3]>
## 5 2015-08-30 <int [398]> Dixon-Coles <S3: dixoncoles> <tibble [6 × 3]>
## 6 2015-09-12 <int [390]> Dixon-Coles <S3: dixoncoles> <tibble [21 × 3]>

Evaluating the models

We can evaluate the models’ predictions using the log loss metric.

dc_log_loss <-
model_predictions %>%
select(model, predictions) %>%
unnest() %>%
left_join(games, by = "match_id") %>%
mutate(
obs_outcome = case_when(
hgoals > agoals ~ "home_win",
agoals > hgoals ~ "away_win",
hgoals == agoals ~ "draw"
),
log_loss = ifelse(outcome == obs_outcome, -log(prob), -log(1 - prob))
) %>%
group_by(model) %>%
summarise(log_loss = mean(log_loss))

head(dc_log_loss)
## # A tibble: 2 x 2
##   model          log_loss
##   <chr>             <dbl>
## 1 Dixon-Coles       0.593
## 2 Dixon-Coles xG    0.573

Of course, these numbers don’t mean much on their own. What does a 0.02 difference in log loss actually mean?

To put these into context, I’ve calculated the log loss for a few benchmark models. I haven’t shown the benchmarks inline, but the code to calculate them is available here.

  • Benchmark
    • Assume all teams are the same strength and predict outcomes in line with historical frequencies (approximately H = 45%, D = 25%, A = 30%)
  • Closing odds
    • Implied probabilities from Pinnacle closing odds (from football-data.co.uk). You’re probably not going to get too close to these with public models/data.
  • Market-ratings
    • Team strength estimates derived from historical closing odds. An explanation of this method and links to code can be found here.

Comparing the predictions

bind_rows(
dc_log_loss,
market,
benchmark,
marketratings
) %>%
ggplot(aes(x = reorder(model, log_loss), y = log_loss)) +
geom_point(size = 3) +
coord_flip() +
labs(title = "Average log loss",
subtitle = "Premier League 14/15 to 17/18",
x = NULL,
y = NULL) +
theme_minimal()

Comparing the predictions we see that the increase in predictive accuracy we get from using xG over goals is in the same ballpark as the difference between using goals scored/conceded vs no team strength information at all.

However, this increase in predictive accuracy applies to computers, not humans. And if we go back to the initial claim that this post is supposed to be about, the implication is clearly that xG provides real value to humans trying to understand the game.

I think this is more of an open question; real people watching a game of football can pay attention to more than just the score. However, most fans, pundits, and analysts can’t watch every game in the season. In those cases, xG provides real and significant value over just looking at results.

While people can’t watch every game, and we still have people suggesting that “the table doesn’t lie”, I think there’s room for xG (or similar) to provide insight. How much value it provides, though, may depend on how good you are at synthesising information beyond results.