Extracting data from FBref for International Matches
Jason Zivkovic
2024-09-02
Source:vignettes/fbref-data-internationals.Rmd
fbref-data-internationals.Rmd
Overview
This package is designed to allow users to extract various world football results and player statistics from the following popular football (soccer) data sites:
Installation
As at 2024-06-29, we are no longer including instructions to install from CRAN. The version pushed to CRAN is very much out of date, and with very regular updates to this library, we advise installing from GitHub only.
You can install the released version of worldfootballR
from GitHub
with:
# install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")
Usage
Package vignettes have been built to help you get started with the package.
- For functions to extract data from FBref, see here
- For functions to extract data from Transfermarkt, see here
- For functions to extract data from Understat, see here
- For functions to load pre-scraped data, see here
This vignette will cover the functions to extract data for international matches from FBref.com.
The functions in this document are all documented in the FBref data vignette, however the method for using these functions for international matches differs from those played in domestic leagues.
NOTE:
As of version 0.5.2
, all FBref functions now come with a
user-defined pause between page loads to address their new rate
limiting. See this
document for more information.
Important
To get the competition URLs needed for a lot of the functions in this
document, refer to the column comp_url
for the relevant
league / competition you’re interested in, in this stored data file.
Helper Function
Get match urls
To get the match URLs needed to pass in to some of the match-level
functions below, fb_match_urls()
can be used.
wc_2018_urls <- fb_match_urls(country = "", gender = "M", season_end_year = 2018, tier = "", non_dom_league_url = "https://fbref.com/en/comps/1/history/World-Cup-Seasons")
friendly_int_2021_urls <- fb_match_urls(country = "", gender = "M", season_end_year = 2021, tier = "", non_dom_league_url = "https://fbref.com/en/comps/218/history/Friendlies-M-Seasons")
euro_2021_urls <- fb_match_urls(country = "", gender = "M", season_end_year = 2021, tier = "", non_dom_league_url = "https://fbref.com/en/comps/676/history/European-Championship-Seasons")
copa_2019_urls <- fb_match_urls(country = "", gender = "M", season_end_year = 2019, tier = "", non_dom_league_url = "https://fbref.com/en/comps/685/history/Copa-America-Seasons")
Match-Level Data
The following sections outlines the functions available to extract data at the per-match level
Get match results
To get the match results (and additional metadata), the following function can be used.
To use this functionality, simply leave country = ''
and
pass the non-domestic league URL
# euro 2016 results
euro_2016_results <- fb_match_results(country = "", gender = "M", season_end_year = 2016, tier = "", non_dom_league_url = "https://fbref.com/en/comps/676/history/European-Championship-Seasons")
# 2019 Copa America results:
copa_2019_results <- fb_match_results(country = "", gender = "M", season_end_year = 2019, non_dom_league_url = "https://fbref.com/en/comps/685/history/Copa-America-Seasons")
# for international friendlies:
international_results <- fb_match_results(country = "", gender = "M", season_end_year = 2021, tier = "", non_dom_league_url = "https://fbref.com/en/comps/218/history/Friendlies-M-Seasons")
Get match report
This function will return similar results to that of
fb_match_results()
, however fb_match_report()
will provide some additional information. It will also only provide it
for a single match, not the whole season:
# function to extract match report data for 2018 world cup
wc_2018_report <- fb_match_report(match_url = wc_2018_urls)
# function to extract match report data for 2021 international friendlies
friendlies_report <- fb_match_report(match_url = friendly_int_2021_urls)
Get match summaries
This function will return the main events that occur during a match, including goals, substitutions and red/yellow cards:
# first get the URLs for the 2016 Euros
euro_2016_match_urls <- fb_match_urls(country = "", gender = "M", season_end_year = 2016, tier = "", non_dom_league_url = "https://fbref.com/en/comps/676/history/European-Championship-Seasons")
# then pass these to the function to get match summaries:
euro_2016_events <- fb_match_summary(euro_2016_match_urls)
Get match lineups
This function will return a dataframe of all players listed for that match, including whether they started on the pitch, or on the bench.
From version 0.2.7, this function now also returns some summary performance data for each player that played, including their position, minutes played, goals, cards, etc.
# function to extract match lineups
copa_2019_lineups <- fb_match_lineups(match_url = copa_2019_urls)
Get shooting and shot creation events
The below function allows users to extract shooting and shot creation event data for a match or selected matches. The data returned includes who took the shot, when, with which body part and from how far away. Additionally, the player creating the chance and also the creation before this are included in the data.
shots_wc_2018 <- fb_match_shooting(wc_2018_urls)
Get advanced match statistics
The fb_advanced_match_stats()
function allows the user
to return a data frame of different stat types for matches played.
Note, some stats may not be available for all comps.
The following stat types can be selected:
- summary
- passing
- passing_types
- defense
- possession
- misc
- keeper
The function can be used for either all players individually:
advanced_match_stats_player <- fb_advanced_match_stats(match_url = wc_2018_urls, stat_type = "possession", team_or_player = "player")
Or used for the team totals for each match:
advanced_match_stats_team <- fb_advanced_match_stats(match_url = wc_2018_urls, stat_type = "passing_types", team_or_player = "team")