How Data Science Driven Approach Has Changed The Game In Football


Sports Analytics is an attempt to use data to gain that extra edge over teams that don’t. Sports teams have begun to realize not having/utilizing data driven insights severely handicaps their chance of success against teams that use analytics to make decisions.

Managers and owners of teams have come to realize that gut feel and on-field talent alone is not enough to be successful. 

In this article we will take a look at how data scientists have become quintessential to a sports team’s success.


Expected Goals (Football)


Football teams and their success have always been evaluated on the final scoreline.


”Did your team Win/ Lose/ Draw the game?” 


“Did it score more goals than the opposition?”


These have been the two questions that have dominated the game. 


While the result is always king, football being a team sport is much more complex. One mind-blowing save by the goalkeeper or one wonder strike by the opposition player could be the difference between victory and defeat. 


Evaluating a team’s or a player’s performance purely based on goals could be misleading as these wonder saves and goals are rare occurrences and can cloud people’s judgment. 


What are Expected Goals Metrics? 


Let us first get the definition out of the way before we go into the interesting bit. 


Expected goals is a metric that calculates the number of goals that a team should have scored or saved based on the quality of shots taken

This means a wonder goal, though counting as one goal by its original definition, does not score very high on the expected goals metric. This is because a shot from such a long range, at such a difficult angle is usually not expected to be a goal, but through talent and sheer luck has become one. The same player may not be able to replicate the same goal again because it's a low probability event. 


Put crudely, what is the probability of the shot turning into a goal and hence the name, expected goals. 


How are Expected Goals Calculated?


Expected goals are calculated purely based on the quality of the shots taken. Every shot has a few parameters that determine its quality and Expected Goals are based on the same. Below are some of these parameters


  • Distance to the goal
  • Angle to the goal​
  • One-on-one
  • Big chance
  • Body part (e.g., header or foot)
  • Type of assist (e.g., through ball, cross, pull-back etc) ​
  • Pattern of play (e.g., open play, fast break, direct free kick, corner kick, throw-in etc)

Every shot taken in a game has these parameters recorded and machine learning models are built that optimize for the success of these shots turning into goals. These models are later applied on new shots taken and their probability of turning into goals is calculated. 


Let's take a simple example of a shot taken from outside the box and let's do some back of the envelope calculation. Usually shots taken from outside the box have a lesser chance of conversion compared to shots taken from inside. 


For example, say shots taken from outside the box have a 5% conversion rate, meaning if there are 100 shots taken from outside the box, 5 have a chance of becoming a goal. Now say for example Cristiano Ronaldo scores 2 goals from 2 shots from outside the box in a game. Though the goals would say 2 goals to Manchester United, the expected goals tally would be 0.1 (0.05*2) goals for Manchester United. This is a fair reality of what we can expect from the shots taken. In football terms, Cristiano Ronaldo and Manchester United have over-performed on their Xg because a similar shot again may not necessarily lead to a goal in the future. In analytics terms, the shot quality was not indicative of a goal and is not sustainable or the performance will eventually regress to its mean. 


The below graphic is from StatsBomb, one of the leading analytics providers in the football world.

data science in football statsbomb


This shows that as the distance to the goal increases and as the angle to the goal becomes more acute, the chances of the shot converting becomes lesser. While this might be intuitive enough, the fact that it is backed by data strengthens the confidence in the approach. 

This is again proven by the below graphic


football analytics graph


As the distance to the goal increases, the probability of a goal decreases exponentially. Similarly it is easier to score if the shot is taken when the angle is not so acute. These two graphics give a good overview of how expected goals are calculated. While there are other things that go into an expected goals model (like number of players between shot and the goal, whether the shot was taken with a player’s strong or weak foot). 

How are Expected Goals utilized?

Performance Monitoring: 


Let's take a real life example of two players here. One is Gabriel Jesus playing for Manchester City in England and the other is Hakan Calhanoglu playing for AC Milan in Italy.

Both have taken 100 shots for each of their clubs. Gabriel Jesus has scored 14 goals while Hakan Calhanoglu has scored 8 goals. It immediately looks like Jesus is a much better player having scored almost double the goals from the same number of shots. 

But an XG model doesn’t show such a stark difference. The XG for the same number of shots for Jesus is 17.7 while it is 7 for Calhanoglu. This means the quality of chances each of them have received to score their goals have been vastly different. 



Gabriel Jesus       

Hakan Calhanoglu







Goal Conversion Rate



Expected Goals



Expected Goals

Conversion Rate




From a pure goals perspective it looks like Jesus has performed better than Calhanoglu. But from a XG perspective it does look like Calhanoglu has performed better, converting better from the chances he has received. Maybe if Calhanoglu had received the same quality of chances, he could have scored more goals than Jesus.


This gives an idea of how the performance of a player/team can be monitored using XG. 

XG gives an indication of over/under performance. If a team takes 100 shots and is expected to score 20 goals but have scored 30 goals, it is a reason for  concern for the manager because the next 100 shots might not give his team 30 goals and he/she has to make his team generate better quality chances.  Similarly if from 100 shots the team scored 10 while the expected goals should have been 20, the manager can infer that not everything is wrong with the way the team is playing and it may just be a matter of time before the goals start coming in. 


Here are a few examples of some of the top coaches talking about expected goals in a subtle manner.


Pep Guardiola, manager of Manchester City (current champions of England), after his team beat the opposition 8-0.

“When you shoot on target five times for five goals - the quality of players we had, made the difference. In the first minutes we gave them two chances to score two goals and they didn’t”


Similarly, here is Thomas Tuchel, the current manager of Chelsea speaking about his team’s luck. 


“We were very strong for 70 minutes and very lucky in the last 20 minutes to escape with a win”

His team’s xG graph


xG graph football data science


Player Scouting and how expected goals help make better decisions :


While teams like Real Madrid, Barcelona, Manchester United and a handful of other clubs can afford to buy readymade and proven talent, a majority of the clubs try to get talents in the making and cash in on them when the vultures come prowling. And XG is the metric that helps them out.


There are two kinds of scouts that are emerging in football. 


The Traditional Scouts, who have always been there and the new and in-vogue data scouts. Traditional scouts go and watch players in action and give their reports to managers. While there are obvious advantages to it like confirming a player’s talent, assessing his/her behavior on the pitch to different setbacks and so on, there are also quite a few disadvantages. There are only so many players and matches that a scout can watch and because of this there is the obvious issue of missing out on talents. 


Data Scouts are people who scurry data of 100s of players across different leagues and different tiers of leagues. They do not have the issue of needing to be in place to assess a player. A player becomes a few metrics on his/her laptop and this makes it easier to analyze players at scale. One of the chief filters that analysts here use is XG. 


Some of the most important factors in xG are the quality of the shot, the distance and angle to the goal and so on. As a result of this it becomes a normalized metric and hence people are able to compare players across leagues and tiers much more easily than before. 


Consider this report of a few players based on their xG (source)


We have identified four interesting players that have a significant sample size and are performing well in our metrics. All four of these are currently with clubs who occupy a tier of football in which selling talented players to bigger and more affluent clubs is the norm. We have the Canadian forward Jonathan David who is currently playing with Gent in the Belgian Pro League. He is averaging 1.01 goal contributions per 90 from expected goal contributions per 90 of 0.66 while playing 2,232 minutes. The fact that David is outperforming his metrics by 0.35 per 90 suggests excellent finishing combined with some luck.

Next, we have highlighted the Dutch forward Myron Boadu of AZ Alkmaar. Boadu is currently averaging 0.77 goal contributions per 90 from an expected goal contribution score of 0.68 per 90 while playing 2,108 minutes. Boadu is playing in an AZ side that boasts an extremely efficient attack.

Thirdly, we have the Nigerian international Victor Osimhen who currently plays with Lille in the French top-flight. Osimhen is an interesting character as he was initially on the books at Wolfsburg in Germany and following a successful loan spell in Belgium with Chaleroi he made the move to Lille. The German side will be kicking themselves having seen the value of the Nigerian forward increase exponentially since moving to France. He is averaging 0.63 goal contributions per 90 from an expected goal contribution of 0.66 in 2,439 minutes.

Lastly, we have Vangelis Pavlidis of Willem II in Holland. The Greek forward was formerly on the books of Borussia Dortmund before moving to Holland following a loan spell. He is averaging 0.52 goal contribution per 90 and 0.62 expected goal contribution per 90 from 2,274 minutes.

The above report clearly calls out players’ capabilities (overperforming their xG, at par with xG and under-performing their xG) and gives a very objective overview of their performance though these players play in very different leagues. 


Fantasy Premier League and expected goals: 


Fantasy sports is fast picking steam all across the world. The money on offer is increasing by the day. And as always there is the joy of one upping your friends and involving in some great banter. With the stakes getting higher, the need to spot talent that will give you points is also becoming bigger. And not using data here definitely handicaps your chances of success. 

Expected goals and assists is one of the most used metrics to spot talent. 


This is predominantly how football fantasy works. There are the usual big talents like Cristiano Ronaldo, Lionel Messi, Neymar to name a few, that all fantasy teams might have. It is the second tier players that become a differential in a lot of cases and second tier players make the majority of the pool. Fantasy team owners use expected goals as the favored metric here. 


This is how the scenario plays out. 


Expected goals usually indicate a future performance. If a player has high xG, it is more likely that he/she is going to score goals and get points in the near future and similarly if a player has low xG, it could very well mean he/she is not performing as well and it might be time to drop the player. 


Here is an example by one leading football outlet (fantasyfootballscout) giving FPL advice:


The Chelsea wingbacks (or “midfielders” as Thomas Tuchel would prefer we call them) are returning such an incredible volume of points lately that I’ve plumped for both Reece James and Ben Chilwell

There’s not a huge amount between them this season for some of my favorite stats for defenders: James has 20 touches in the penalty area compared to Chilwell’s 14 and has created 10 chances compared to Chilwell’s six. However, James’s expected goal involvement (xGI) figure is superior (2.01 to 0.93), which propels him to the top of my defender list, which is not too surprising after a 21-point return in Gameweek 10.


This makes what would have otherwise been a very tough choice a relatively straightforward one. James has higher xGI and that means he is more likely to be involved in goals and hence is a FPL asset. 

With so many players to pick from, xG helps us rate players objectively and pick a fantasy team purely based on stats. 

Skill-Lync's Post Graduation Program in Data Science can help you ace advanced data science concepts through hands-on experience and industry-relevant skill.


About the Author:

Venkatesh is a Data Scientist at Freshworks, Inc. He believes everything in the world can be explained by analyzing trends and following the numbers. He loves using data to understand if a marketing strategy is working, if an A/B test is giving significant results or if there is a good propensity for the user to buy the product line and thinks sports is the best thing to have happened to the humanity.



Get a 1-on-1 demo to understand what is included in the Data Science course and how it can benefit you from an experienced career consultant.

Request a Demo Session

These courses will launch your career Data Science courses

See all

Get in touch with us
Hurry up! Hurry up!

© 2022 Skill-Lync Inc. All Rights Reserved.