Introduction
Currently, science and technology are data-driven, as most business activities are (or should be) and sport is no exception: indeed, one can collect huge amounts of data from a team sports match, such as a basketball, volleyball or football match, for example by monitoring the team or single players to collect information about their movements around the field.
Collecting match data
Sensors can be used to collect information about other features of a football match – the physical fitness of players can be assessed by collecting medical data about them in real time, although this is only feasible during training sessions, since wearable devices are required to achieve this.
Using AI4 Soccer by Inmatica
That such data is available and can be collected is no wonder, the real question is: how can we deal with that data in a way that extracts strategic information to be used in decision making, for example by the team manager? Later in this article, we will explore how to use Inmatica’s AI4 Soccer to analyze official match videos.
Improving data analysis through AI and Machine Learning
The answer to the previous question is obvious, in today’s world: AI – Artificial Intelligence!
There are many kinds of AI, some of which have existed for several decades, the most common being Machine Learning (ML), which is a collection of numerical optimization methods that process data to produce various predictions.
ML methods are numerical in the sense that the input data should be numbers, perhaps normalized in some way. Data are crunched by numerical algorithms which are themselves generalizations or particular cases of a broad class of algorithms – optimization algorithms – which are designed to find the best or optimal solution to a numerical problem.
The most important class of such algorithms are artificial neural networks (ANN): such a network is a collection of (a vast amount of) numeric data called weights. An ANN may comprise millions of numbers; a number usually occupies 8 bytes, so we are talking about several megabytes for a single net.
A neural network consumes numerical inputs, provides numerical outputs, and adjusts its weights according to algorithms that depend on the input data and the expected output. The algorithm tries to minimize errors (hence why it stems from optimization problems) in assigning a certain output to input data. This minimization process is called learning, for the very good reason that even we humans, when learning, try to minimize the amount of errors we make!
Moreover, neural networks can process huge amounts of data: to be properly trained to minimize errors, they need to crunch millions of inputs. In fact, they need to process as much training data as possible to produce their extraordinary results, so striking that they are worth both the huge storage and the massive computations they require.
Analyzing behaviors and measuring performance
Machine Learning aims to build an optimization model from raw training data: this model may have two purposes:
- Prediction – to train a network to guess the output value corresponding to a specific input. The algorithm uses a training set, i.e., a set of pairs (input and corresponding output), and learns to guess any other such pair. Classical mathematical methods, such as linear regression, commonly used in Statistics, also do this – they are the simplest examples of ML predictive algorithms.
- Classification – to distinguish whether an item in a dataset belongs to a certain category or not. Suppose, for example, we want to tag tweets according to categories, such as politics, religion, or movies. We could build a network that is trained to rank a single tweet according to those categories: examples of this in action include sentiment analysis systems, where tags are moods.
ML algorithms infer hidden patterns that are in the data themselves, and apply those patterns to new data: classification and prediction may be described in general terms, so, when we speak of analyzing data via Machine Learning, we usually refer to those tasks.
Neural networks can undertake both tasks, but there are other ML algorithms that are also used for these purposes, which are usually simpler from the computational point of view. These include regression methods, support vector machines (SVM), random forests, clustering algorithms, and so on. All of these algorithms belong to the ML sphere since they can accomplish many tasks without necessarily designing a neural network specifically for that purpose. However, if huge amounts of data with hidden information are available, then a special class of neural networks, deep neural networks, performs particularly well for both classification and prediction. Since most analysis in team sports focuses on videos of matches, it is clear that these techniques also come into play in this arena.
Deep learning and Big Data
The adjective “deep”, when used to refer to deep neural networks (and deep learning, which is the purpose of such networks in learning tasks) means that such networks have a huge inner structure. As discussed above, a neural network is a data structure made of weights, i.e., numbers, which are usually arranged in layers. Each layer is connected to the previous and following layers, with the first layer as the input layer, directly fed with the input data, and the last as the output layer.
Deep neural networks comprise dozens, or sometimes hundreds, of layers and millions of weights. The algorithms required to train such networks are more subtle and sophisticated than the norm, and in some cases, are specialized for particular purposes; CNN (Convolutional Neural Networks), invented for image recognition, and Transformers, invented for natural language processing, fall into this category.
Making strategic choices
To train a network one needs to provide a training set that is of at least the same order of magnitude as the network size, thus millions of items. Such huge datasets are often the output of a pre-processing phase – the most painful stage in the ML deployment chain, which takes raw data and processes them to provide numerical input that a neural network can digest.
Big data infrastructures, necessary when huge datasets of heterogeneous and unstructured data are at hand, are required to manage training and testing sets for such algorithms, at least in production environments. The ML engineering process of designing, building, and training such networks is very complicated and requires special tools, for storage, computation, and for software lifecycle management.
Applying changes in the field
How to test is a non-trivial issue in ML systems. However, certain statistical measures can be applied to the output of ML algorithms, essentially based on counting the percentages of false or true predictions according to a supervised dataset; accuracy (the fraction of correct predictions of the model), precision (the fraction of predictions labelled as correct which are actually correct), and so on may be counted.
How to analyze a football match
Several ML applications can be applied to team sports and football in particular. Understanding the physical and psychological condition of players, decision-making during the match, advanced analysis of the strategy of the opposing team, are just a few of the possible applications.
ML algorithms need quantitative variables that can be measured throughout a football match, such as the “geometric shape” of the team during the match: one can compute the length and width of the team, where such lengths are the maximum distances between players along horizontal or vertical axes of the field. Players are viewed as vertices of a polygon and the barycenter of this polygon is computed to identify the geometric center of the team at a particular instant in the match.
That information may be inferred from panoramic videos of the field, taken from a fixed position during the match to get a picture of the whole field and the positions of each player (and the ball). In some cases, footage taken from cameras that track a single player throughout the match is used to analyse the player’s activity during the match, his/her movements and pitch control, thus calculating the probability that a player could control the ball, if it is at location x, and so on. (some users employ more side by side cameras, to get a single panoramic video without the wide angle effect).
Methods to model pitch control provide several pieces of information, such as player positions, ball motion, possibly dangerous situations and off-ball score opportunity computation: the latter allows players to receive credit for being well-positioned to score, even if a teammate cannot get them the ball to take the goal. Such methods are probabilistic and process data using regression models.
Let’s consider a specific case study: AI4 Soccer by Inmatica.
This system collects the metrics mentioned above, using what is essentially a pipeline of video processing tools.
The system makes use of official match videos to perform the following tasks.
- Players and ball detection: a neural network based-algorithm for object recognition within images to detect the players and the ball in the football field.
- Players video tracking: a tracking system to track every player individually over time.
- Players classification: a system to distinguish between players from the focus team and those from the opposing team.
- Perspective transformation: a geometric method to improve precision in the computation of player positions by means of coordinates in the field (using Cartesian coordinates).
The outcomes of these processes are collected in specific databases and conveyed, using ETL systems, to a dashboard that provides dynamic analysis (a Business Intelligence system).
ML techniques are already involved in the first step – player and ball detection – due to the use of a standard and robust algorithm for object detection: YOLO (You Only Look Once). The approach considers the detection problem as a regression problem, with output representing the probability of belonging to a class of objects. Detection is performed via boxes in which objects are framed. The architecture of the deep neural network behind YOLO is a CNN (Convolutional Neural Network) with 160 layers, trained to predict the probability that an object belongs to a certain class, and the probability that an object is both within the image and occupies a certain position. These tasks are performed on several objects in a single image, or better still, a single video frame.
As usual, the system takes advantage of available pre-trained models that have been trained on specific datasets – in this case, ISSIA and SPD soccer datasets. This has made it simpler to fine-tune the pre-training in order to train the network to detect either a player or the ball. The system has been trained to detect the football team of a player by recognising the player’s uniform. The application of this network allows the detection of all players in a team, as in the following snapshot:
The tracking system also takes advantage of a standard tracker, Deep SORT. Like most trackers, Deep SORT relies upon a classical mathematical technique called the Kalman filter (also familiar to engineers) combined with a deep learning architecture. Such trackers work on the output of the image detector. As mentioned previously, the image detector just puts boxes around objects in a video frame, while the object tracker identifies the same object across several frames.
In this way, a single player can be “followed” throughout the match. Moreover, for each player, the set of metrics mentioned above, such as physical data (speed, directions, etc.) and pitch control can be computed. Once the coordinates of each player in each frame are acquired, the computation of these metrics is simple. The problem is therefore to extract player coordinates according to a Cartesian coordinate system.
Since images of fields from official videos are taken not from above the field but from the audience level, perspective has to be considered. Projective Geometry offers the solution to such problems: this classical branch of mathematics deals with properties that remain invariants after a projection has been performed.
By using these techniques and transformations, fairly accurate coordinates are assigned to each player in each frame: across frames, this defines a “curve” which is the path followed by the player across the field over time. To transform the Cartesian representation into two dimensions, the system performs some computer vision manipulation that addresses the line detection problem in particular – this needs to be solved to perform the perspective transformation:
This is actually a post-processing step in which ML is not used. The role of this post-processing is to produce the detection and tracking used to compute these coordinates. The final result is a 2D Cartesian representation of the players in the field at that instant (frame):
The frame sequence allows the computation of the “curve” associated with each player and, consequently, all the physical parameters, pertaining not just to a single player, but to the whole team. These parameters might include team length, team width, team barycenter, the pitch control covered by one or more players, and average distance between players, as well as calculations such as kilometers run by each player, average and instantaneous velocity and acceleration.
Once these computations have been performed for the whole video, all of the metrics (each of which is a time series since it has a value for each frame) are collected in a database which can be queried by a Business Intelligence system to look for relevant information to be used by the team manager or medical staff, for example. Time series are also analysed via neural networks, providing ML tools to perform scenario analyses, and so on. Specifically, a timeline widget is provided via the dashboard, allowing users to dynamically correlate a number of data and parameters by specifying different “time windows”. This allows the analyst to perform a more granular investigation and to extract relevant information that provides a better understanding of match dynamics, and can be used to improve decision making.
Benefits of applying AI on sports like Football
ML systems, such as the one described above, can keep track of all players and ball positions at the same time, facilitating real time computations that are useful to decision making. This is possible not only after the match has ended, but also during the match if the system is on-line and sufficiently efficient to provide real time information.
Some example applications of these techniques include:
- scouting players and evaluating player activity, e.g., by modelling that computes ball possession or other relevant metrics
- predicting ball possession values and supporting decision-making about pressing increase.
- combining data extracted from the match with fitness data from the players to supply medical staff with information.
If team managers start to use those outcomes and keep using them, the outcomes can be used for strategic and tactical decision-making (counter-press, counter-attack, etc.).
ML methods in football and team sports, in general, provide new insights and a lot of hidden information. These methods can be exploited to provide new technical and strategic assets to managers and football teams.
In the end, people who enjoy watching football matches will surely appreciate a better quality of match, and will have even more fun when cheering their favorite football teams.