3 July 26
Modelling the World Cup
I watched a YouTube video a couple days ago about developing a probabilistic model on who would win the World Cup. This video was from before the World Cup started, and the modelers have been updating their predictions as the tournament progresses. Right now, as we are about to start the Group of 16 matches, the model gives Spain a 23.6% chance of winning the tournament, and Argentina a 22.9% of winning it all.
One of the principles the modelers point out is that there is no canonical way to build these models. Different models will have different percent likelihoods right now, though I doubt anyone is giving Canada a higher chance to win the tournament right now than Spain. The World Cup is especially difficult to model, because matches between different national teams are few and far between, especially when they are in different football confederations. Take an example from Group L in our 2026 edition of the World Cup, the teams Ghana and Panama. These two teams had never played each other before. Moreover, there are few matches between the confederation Panama is in (CONCACAF) and the confederation Ghana is in (CAF); the network of matches connecting teams Panama with teams Ghana has played is pretty sparse. What is the probability Ghana wins the group match between the two teams, Panama wins the match, or there is a draw? (Ghana ended up winning the match with Panama 1-0 by scoring in the 95th minute.)
The world of football data analytics is vast but patchy at the same time. It is a global game, and consistency in global data does not exist. It is all a bit intimidating.
Previous: Football Affiliations Next:
