And so my journey into data analytics, specifically in the realm of soccer, begins. Now as many of you know, getting into data analytics can be really tricky and intimidating, so that is why I am so grateful for this post by Paul Riley( @footballfactman on twitter). So pretty much to start off my experience with data, I decided to create this model with data from the 2009/10 to 2016/17 Premier League Seasons, which represents 161 columns of data.
In order to present my data, I have created a column chart for Actual Goal Differential vs. Expected Goal Differential, so far this season:
From this, we can see that Liverpool should be looking to improve and really find form over the next few months of nonstop soccer. Inside the top four, it is interesting to see that both Chelsea and Man United are outperforming what the model says they should be doing at this stage. Crystal Palace seem to be the most unlucky, according to the model, so it’ll be interesting to see if with Zaha back and Benteke coming back, they can get out of the relegation zone, but they’ve dug themselves quite a hole.
In order to make the process of learning this topic more smooth, I will mostly be focusing on the team that I support, Liverpool FC. Therefore, when I looked at an expected assist from key passes model and combined with the expected goals model, I came up with the following:
From the first graph, you see the xG (blue) plus the xA from key passes (orange). Some things to highlight are how important Bobby Firmino is to the attack (he contributes in many ways), Coutinho, despite being absent for many games is world class, and Salah will look to play more centrally and as a striker as seen by his xG.
However, I think the second graph is extremely important because it looks at those same stats, but at a per 90 scale, which benefits players who haven’t had as much playing time. From this we can see how good Coutinho has been when he is on the field, as well as Sturridge who might be making a case for more and more time going forward. However, one thing to note with this (pertaining to Sturridge and Solanke, who has a 1 xG per 90) that with the subs minutes they have gotten, the game state is completely different and therefore their per 90 numbers might be misleading; despite that it is still interesting to see and you would expect Solanke to bag a goal if he continues to get minutes! Lastly, one look at the defensive side of things and Trent’s xA per 90 is very impressive.
Thank you for reading and as always feedback/criticism is always welcome; I am looking to learn more and constantly improve! Next steps: I plan on building a more complex model as soon as I graduate from university in December and then I will be able to spend more time researching data. From looking around, I plan to use a logistic regression model, but I am still unsure of some of the variable that I will be looking at. Furthermore, I will be learning Tableau to provide better visualizations.