How shot-quality modelling replaced scoreline analysis in modern football
Expected Goals (xG) is the most important single statistic in football analytics - and also the most misunderstood. In its simplest form, xG answers the question: given the exact situation of this shot (distance, angle, body part, whether it came from a cross, whether a defender was closing in), how often does an average player score? Add up the xG for every shot a team takes in a match and you get their expected goal total - a much more honest estimate of how well they played than the actual goal total, which is subject to woodwork, lucky deflections and goalkeeping heroics.
The xG concept emerged in the early 2010s, popularised by analysts like Sam Green at Opta and Michael Caley at Cartilage Free Captain. The basic insight is that shots are not equally valuable: a tap-in from two yards has a ~90% scoring probability; a speculative 30-yard effort has a ~3% scoring probability. Treating every shot as a single 'shots on target' unit washes out the real signal.
A modern xG model is a logistic regression (or gradient-boosted tree) trained on millions of historical shots. For each shot it ingests distance to goal, angle to goal, body part used, pass type that preceded it, number of defenders between shooter and goal, and sometimes game state (winning / losing, minute of the match). The output is a probability between 0 and 1 - that's the xG value of the shot.
A team with 1.8 xG in a match is said to have generated chances worth 1.8 goals on average. If they won the match 3-0 despite only 1.2 xG, they over-performed their process; if they lost 0-1 with 2.4 xG, they under-performed. Over long samples, actual goals converge toward xG because the deviations are mostly random (keeper saves, post hits, VAR calls).
The most common complaint against xG is that it 'ignores context' - it treats every shot the same regardless of the pressure, the goalkeeper, the scoreline. The modern generation of models actually account for most of those factors, but the broader critique misses the point: xG is not trying to describe what happened, it's trying to describe what would happen on average if the same chances were replayed repeatedly.
The other critique is that xG undervalues clinical finishing. Jamie Vardy is famously a 'Vardy of the spot' - a player who scores more than his chances suggest. But even Vardy regresses: over his best five seasons, his actual goals are 15% above his xG, not 50%. On a single-match basis the gap can look dramatic, but it shrinks fast over longer samples, which is exactly what you want from a stable modelling input.
The real danger of xG is over-application. Using xG to predict next weekend's scoreline is fine; using it to predict which player will be top scorer next month requires additional shot-volume projections, penalty assignments, and rotation risk - things that the raw xG number doesn't encode. BetsPlug uses xG only for what it's designed for: estimating the Poisson lambda for each team in an upcoming fixture.
Inside our ensemble, we don't use xG as a standalone signal. We feed each team's rolling xG numbers (attacking output, defensive concessions, home/away split) into a Poisson goal model that produces a probability distribution over every possible scoreline. From there you can derive the 1X2 probabilities, Over/Under totals, BTTS probabilities and Asian handicap lines - all from the same xG-driven Poisson surface.
The tricky part is deciding how many matches of xG history to weight. Too few and you overreact to small samples (a 6-shot burst from Bruno Fernandes against ten men doesn't mean United's attack is suddenly elite). Too many and you miss real form shifts (Arsenal's attacking output changed meaningfully after Ødegaard returned from injury). Our pipeline uses a rolling window that blends the last 8 matches with a long-run season-level prior, weighted by the confidence interval around the current estimate.
The xG pipeline is also where we catch data problems fastest. Every week, we cross-check the xG totals from our primary data vendor against a secondary source. Matches with divergences above 0.4 xG get flagged for manual review before any downstream model consumes them. This sounds boring but it's the kind of plumbing that separates a hobbyist model from a production system.
Mistake one: treating xG as a guaranteed result. 'Arsenal had 2.5 xG so they should have won' is a misreading - they had a performance consistent with 2.5 average goals, but the actual distribution is wide. A team with 2.5 xG still scores zero ~8% of the time.
Mistake two: comparing xG across very different data providers. StatsBomb xG, Opta xG and Understat xG all use different training sets, different feature engineering and different shot metadata, so a 1.8 xG from one provider doesn't equal a 1.8 xG from another. Always compare like with like.
Mistake three: mistaking xG for a skill rating. A player with 0.5 xG per 90 minutes is not a better finisher than a player with 0.4 xG per 90 - they just get into better positions. Finishing skill shows up in the gap between expected and actual goals, which is noisy and only stabilises over thousands of shots.
Join BetsPlug to see all upcoming predictions across the top leagues - with confidence scores, live updates and our full public track record.
Just €0.01 activates your 7-day full-access trial.
Common questions on this topic, answered without the marketing fluff.