Como a modelação da qualidade dos remates substituiu a análise do resultado no futebol moderno
Os golos esperados (xG) são a estatística mais importante na análise do futebol - e também a mais mal compreendida. Na sua forma mais simples, o xG responde à pergunta: dada a situação exacta deste remate (distância, ângulo, parte do corpo, se veio de um cruzamento, se um defesa estava a aproximar-se), com que frequência é que um jogador médio marca? Somando o xG de cada remate de uma equipa num jogo, obtém-se o total de golos esperados - uma estimativa muito mais honesta da forma como jogaram do que o total de golos reais, que está sujeito a traves, desvios de sorte e heroísmos dos guarda-redes.
The xG concept emerged in the early 2010s, popularised by analysts like Sam Green at Opta and Michael Caley at Cartilage Free Captain. The basic insight is that shots are not equally valuable: a tap-in from two yards has a ~90% scoring probability; a speculative 30-yard effort has a ~3% scoring probability. Treating every shot as a single 'shots on target' unit washes out the real signal.
A modern xG model is a logistic regression (or gradient-boosted tree) trained on millions of historical shots. For each shot it ingests distance to goal, angle to goal, body part used, pass type that preceded it, number of defenders between shooter and goal, and sometimes game state (winning / losing, minute of the match). The output is a probability between 0 and 1 - that's the xG value of the shot.
A team with 1.8 xG in a match is said to have generated chances worth 1.8 goals on average. If they won the match 3-0 despite only 1.2 xG, they over-performed their process; if they lost 0-1 with 2.4 xG, they under-performed. Over long samples, actual goals converge toward xG because the deviations are mostly random (keeper saves, post hits, VAR calls).
The most common complaint against xG is that it 'ignores context' - it treats every shot the same regardless of the pressure, the goalkeeper, the scoreline. The modern generation of models actually account for most of those factors, but the broader critique misses the point: xG is not trying to describe what happened, it's trying to describe what would happen on average if the same chances were replayed repeatedly.
The other critique is that xG undervalues clinical finishing. Jamie Vardy is famously a 'Vardy of the spot' - a player who scores more than his chances suggest. But even Vardy regresses: over his best five seasons, his actual goals are 15% above his xG, not 50%. On a single-match basis the gap can look dramatic, but it shrinks fast over longer samples, which is exactly what you want from a stable modelling input.
The real danger of xG is over-application. Using xG to predict next weekend's scoreline is fine; using it to predict which player will be top scorer next month requires additional shot-volume projections, penalty assignments, and rotation risk - things that the raw xG number doesn't encode. BetsPlug uses xG only for what it's designed for: estimating the Poisson lambda for each team in an upcoming fixture.
Inside our ensemble, we don't use xG as a standalone signal. We feed each team's rolling xG numbers (attacking output, defensive concessions, home/away split) into a Poisson goal model that produces a probability distribution over every possible scoreline. From there you can derive the 1X2 probabilities, Over/Under totals, BTTS probabilities and Asian handicap lines - all from the same xG-driven Poisson surface.
The tricky part is deciding how many matches of xG history to weight. Too few and you overreact to small samples (a 6-shot burst from Bruno Fernandes against ten men doesn't mean United's attack is suddenly elite). Too many and you miss real form shifts (Arsenal's attacking output changed meaningfully after Ødegaard returned from injury). Our pipeline uses a rolling window that blends the last 8 matches with a long-run season-level prior, weighted by the confidence interval around the current estimate.
The xG pipeline is also where we catch data problems fastest. Every week, we cross-check the xG totals from our primary data vendor against a secondary source. Matches with divergences above 0.4 xG get flagged for manual review before any downstream model consumes them. This sounds boring but it's the kind of plumbing that separates a hobbyist model from a production system.
Mistake one: treating xG as a guaranteed result. 'Arsenal had 2.5 xG so they should have won' is a misreading - they had a performance consistent with 2.5 average goals, but the actual distribution is wide. A team with 2.5 xG still scores zero ~8% of the time.
Mistake two: comparing xG across very different data providers. StatsBomb xG, Opta xG and Understat xG all use different training sets, different feature engineering and different shot metadata, so a 1.8 xG from one provider doesn't equal a 1.8 xG from another. Always compare like with like.
Mistake three: mistaking xG for a skill rating. A player with 0.5 xG per 90 minutes is not a better finisher than a player with 0.4 xG per 90 - they just get into better positions. Finishing skill shows up in the gap between expected and actual goals, which is noisy and only stabilises over thousands of shots.
Cada escolha bloqueada acima é uma previsão completa de futebol com IA, probabilidades, confiança e o melhor tipo de aposta para aquela partida. Um teste de € 0,01 desbloqueia tudo por 7 dias.
€0,01 ativa seu teste de acesso total de 7 dias. Sem taxas ocultas.
Common questions on this topic, answered without the marketing fluff.
Once you understand the math, see it run live on every fixture inside BetsPlug.