Martinez-Arastey, Guillermo, Datson, Naomi ORCID: https://orcid.org/0000-0002-5507-9540, Smith, Neal and Robins, Matthew
(2025)
Foundations of Expected Points in Rugby Union: A Methodological Approach.
Journal of Sports Analytics.
ISSN 2215-020X
(In Press)
![]() |
Accepted Version
File not available for download. Available under License Creative Commons Attribution. Download (722kB) |
Abstract
This study explores the feasibility of an Expected Points metric for rugby union, aiming to shift performance analysis from descriptive indicators to a predictive metric of possession quality. Notational analysis was conducted on 132 Premiership Rugby matches, producing a dataset of 35,199 unique phases of play containing variables such as team in possession, pitch location, play type, score differences, time remaining and scoring outcomes. Four machine learning algorithms were explored to predict scoring outcomes: multinomial logistic regression, random forest, support vector machine and k-nearest neighbors. After extensive feature engineering and hyperparameter optimisation, the best-performing model achieved 39.7% accuracy, below a literature-derived baseline for practical usability (44.3%), making it unsuitable for applied contexts. A key challenge was predicting minority scoring outcomes due to severe class imbalance. SMOTE was explored to address this imbalance, resulting in a lower accuracy (35.7%) but an improved 34.4% F1-score. This study highlights the limitations of modelling scoring outcomes in open-play team sports, challenging the predominant positivist paradigm in sports performance analysis. The methodology provides critical foundational groundwork and a benchmark for future research to build upon. It recommends exploring advanced samplers for minority classes, expanded feature sets and alternative modelling techniques, such as recurrent neural networks.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.