Clustering Fifa Players
Are all football players the same?
[UPDATE: I have started a tech company. You can find out more here]
There are 11 positions in a football team who play in various roles, the 4 main ones being:
- Goal keepers
- Defenders
- Midfielders
- Strikers
With this impressive data set including stats for every single FIFA 18 player, scraped from Sofifa we are going to see if AuDaS, an Automated Data Science platform developed by Mind Foundry, can automatically detect these classes and whether we can extract any insights.
Preparing the data
The data covers 17,000 players with 70 attributes for each of them. AuDaS has automatically detected that the “Name” column has the level identity and can be dropped or ignored for the modelling. It has also detected that the “Value” and “Wage” formatting will need to be converted to numerical values.
We are going to do a Regular expression transformation of these columns to capture the values in between the currency and magnitude symbols (€, M, k):
Exploratory Analysis
The histogram view provides some interesting insights into the players. As we can see, the value and wages have a Pareto distribution.
Another view suggests that there are some clear clusters which stand out across the player attributes:
Hopefully AuDaS will be able to provide some insights into this cluster.
Automated Clustering
We are first going to search for 11 clusters. AuDaS provides full transparency in the steps and models it has chosen. It then automatically generates the Silhouette plots and a t-SNE visualisation of the clusters.
AuDaS has identified some clear clusters but quite a few still overlap. The poor performance could be explained by the fact that there aren’t really 11 positions on a football as these are affected by the chosen formation (offensive, defensive, etc.).
Clustering over 4 classes provides some more distinct results and the overall Silhouette coefficient is almost twice as high as the one for 11 clusters.
Interpreting the clusters
A quick inspection of the clusters associated to the players allows us to determine with a fairly strong confidence that Cluster 1 contains Goal Keepers (Neuer, De Gea, Courtois, …). This is corroborated by the high Silhouette coefficient (0.706) and the distribution of their key skills:
Cluster 0 also seems to contain strikers which is corroborated by the distributions and player names (Ronaldo, Messi, …) but the t-SNE visualisation suggests that it might also contain some attacking midfielders.
Cluster 2 also seems to contain strikers although mediocre ones:
Cluster 3 seems to contain defenders (aggressive, strong intercepters and tacklers):
Conclusion
In minutes we were able to identify clusters amongst the football players. Some were clear (Goal Keepers and defenders) whereas clusters 0 and 2 seemed a bit blurred. The distinction was no longer on the roles but more on the quality of the players.
A full video of this process can be viewed here. If you are interested in other case studies with AuDaS you find a few more bellow.
If you are interested in what automated Machine Learning can do for you, fill in the this Type Form:
Team and Resources
Mind Foundry is an Oxford University spin-out founded by Professors Stephen Roberts and Michael Osborne who have 35 person years in data analytics. The Mind Foundry team is composed of over 30 world class Machine Learning researchers and elite software engineers, many former post-docs from the University of Oxford. Moreover, Mind Foundry has a privileged access to over 30 Oxford University Machine Learning PhDs through its spin-out status. Mind Foundry is a portfolio company of the University of Oxford and its investors include Oxford Sciences Innovation, the Oxford Technology and Innovations Fund, the University of Oxford Innovation Fund and Parkwalk Advisors.