Diagonal gridlines in 2 ways
Gridlines are a pretty standard part of any plot. As such, we often don’t think much about gridlines beyond whether to include them, and/or their visual appearance (colour, style, etc.).
However, this is a missed opportunity. In certain cases, using nonstandard gridlines can enhance the legibility of a graphic and its take-home message. In this post, I will look at a couple of different ways that diagonal gridlines can be used to this effect.
1: For ratios (HFA disruption plot)
The first example comes from a previous post, investigating potential disruption to home advantage pre/post-covid:
(See the end of the previous post for the ggplot2 implementation.)
In general, this kind of approach can be useful when the ratio between the x- and y-measures is meaningful.
Another potential place to apply this is the “p90 scatter” charts that are common on social media. These charts often consist of a scatter plot showing a set of football players’ score in 2 correlated metrics (for example, xGp90 and Goals p90), with the outliers annotated (these being noteworthy players).
In many cases, the ratio of y/x is useful information. For a scatter of xGp90 against Goals p90, the ratio would be a measure of over/underperformance against xG. On other charts, it may be a success or conversion rate.
This technique can be applied broadly, as long as the ratio of y/x is meaningful. For instance, see how adding diagonal lines to this plot highlights that Adama Traoré and Allan Saint-Maximin had almost the same rate of progressive distance per carry in 2019/20:
Expand for ggplot2 implementation
library(tidyverse) |
Diagonal gradients
This specific technique has been extended by Ben Mayhew (@experimental361) quite distinctively by augmenting the diagonal gridlines with a colour gradient, like so:
(Taken from https://twitter.com/experimental361/status/1287746116721795074)
Expand for ggplot2 implementation
I haven’t got a direct ggplot implementation of Mayhew’s colour gradient. However, you can achieve a single-colour gradient by drawing successive semi-transparent polygons over the plot:
library(tidyverse) |
2: For comparable units (NPG+A plot)
In rarer cases, the x- and y-measures have comparable units. That is, you could meaningfully add the x and y values together. For example, goals and assists are often counted together to create overall “goal contribution” (likewise shots, key passes, and “shot contribution”).
In cases where the sum of the x and y values is useful, a different kind of diagonal grid can be drawn:
The diagonal gridlines here show the overall expected goal contribution (npxG + xA). Two players on the same line will have the same npxG + xA, despite having different npxG or xA figures.
This stratifies the plot elements more meaningfully than standard gridlines and helps readers draw useful comparisons between players.
If we compare to the same plot without gridlines, it is easy to see Di María and Mbappé as outliers and put them in the same tier.
However, this is not true. Mbappé’s goal contribution is a level above that of Di María, Neymar, Slimani, or Icardi; referring back to the first plot, we can see that the latter group all sit within the same tranche.
There is a minor injustice here that xA is harder to come by than xG. Di María’s achievement is rarer and thus perhaps more impressive than Neymar’s (likewise Payet:Cardona, and so on). However, I think that on-balance, the benefits to plot legibility are worth it.
Expand for ggplot2 implementation
library(tidyverse) |
Minor algebraic aside (you can skip this bit)
The formula of a straight line is y = mx + c
. Given an x-coordinate, you can find the corresponding y-coordinate on that line by multiplying it by m
(the slope) and adding c
(the intercept).
You can therefore view the first set of examples as diagonal gridlines where the slope is of interest. In other words, we set gridlines with varying values of m
, and a fixed intercept (generally, c
will be 0).
In the second category, we are looking at varying the intercept (c
), while keeping the slope constant. In our example, m
is fixed to -1. This is because we are taking the sum of the x- and y- measures, and re-arranging the formula y + x = c
gives us y = (-1)x + c
.
If you were interested in the difference between the y- and x- measures, you would draw diagonal gridlines with varying slope and a gradient of +1.
Conclusion
You can use diagonal gridlines liek these to highlight key features of your data in a couple of different cases:
- When the ratio between x and y is meaningful, you can draw gridlines with varying slope
- When the x- and y-axes have comparable units, you can draw gridlines with a varying intercept