Extracting data from FBref
Jason Zivkovic
2024-11-15
Source:vignettes/extract-fbref-data.Rmd
extract-fbref-data.Rmd
Overview
This package is designed to allow users to extract various world football results and player statistics from the following popular football (soccer) data sites:
Installation
As at 2024-06-29, we are no longer including instructions to install from CRAN. The version pushed to CRAN is very much out of date, and with very regular updates to this library, we advise installing from GitHub only.
You can install the released version of worldfootballR
from GitHub
with:
# install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")
Usage
Package vignettes have been built to help you get started with the package.
- For functions to extract data from Transfermarkt, see here
- For functions to extract data from Understat, see here
- For functions to extract data for international matches from FBref, see here
- For functions to load pre-scraped data, see here
This vignette will cover the functions to extract data from FBref.com.
NOTE:
As of version 0.5.2
, all FBref functions now come with a
user-defined pause between page loads to address their new rate
limiting. See this
document for more information.
Join FBref and Transfermarkt data
To be able to join data player between FBref and Transfermarkt,
player_dictionary_mapping()
has been created. There are
over 6,100 players who have been listed for teams in the Big 5 Euro
leagues on FBref since the start of the 2017-18 seasons, with all of
these mapped together. This is expected to be updated and grow over
time. The raw data is stored here
mapped_players <- player_dictionary_mapping()
dplyr::glimpse(mapped_players)
#> Rows: 14,872
#> Columns: 4
#> $ PlayerFBref <chr> "A.J. DeLaGarza", "AJ Marcucci", "Aapo Halme", "Aaron Bast…
#> $ UrlFBref <chr> "https://fbref.com/en/players/171b3c37/AJ-DeLaGarza", "htt…
#> $ UrlTmarkt <chr> "https://www.transfermarkt.com/a-j-delagarza/profil/spiele…
#> $ TmPos <chr> "Right-Back", "Goalkeeper", "Centre-Back", "Left Winger", …
FBref Helper Functions
The following section will outline the various functions available to find different URLs to be able to pass through the FBref suite of functions outlined in this vignette.
League URLs
To extract the URL of any country’s league(s) (provided fbref have
data for the league), use the fb_league_urls()
function.
This function also accepts a tier
argument. for
first-tier leagues, select ‘1st’, for second-tier select ‘2nd’ and so
on.
A fill list of countries available can be found in the
worldfootballR_data
repository and can be found here.
fb_league_urls(country = "ENG", gender = "M", season_end_year = 2021, tier = '2nd')
Team URLs
To get a list of URLs for each team in a particular season, the
fb_teams_urls()
function can be used:
fb_teams_urls("https://fbref.com/en/comps/9/Premier-League-Stats")
Player URLs
To get a list of player URLs for a particular team, the
fb_player_urls()
function can be used. The results of this
output can be passed through to the player season stat functions
fb_player_season_stats()
and
fb_player_scouting_report()
.
fb_player_urls("https://fbref.com/en/squads/fd962109/Fulham-Stats")
Get match urls
To get the match URLs needed to pass in to some of the match-level
functions below, fb_match_urls()
can be used:
epl_2021_urls <- fb_match_urls(country = "ENG", gender = "M", season_end_year = 2021, tier="1st")
League Season-Level Data
This section will cover the functions to aid in the extraction of season team statistics.
Get Season Team Stats
The fb_season_team_stats
function allows the user to
return a data frame of different stat types for all teams in Domestic
leagues here.
Note, some stats may not be available for all leagues. The big five
European leagues should have all of these stats, but for those leagues,
it’s more efficient to use
fb_big5_advanced_season_stats()
.
The following stat types can be selected:
- league_table
- league_table_home_away
- standard
- keeper
- keeper_adv
- shooting
- passing
- passing_types
- goal_shot_creation
- defense
- possession
- playing_time
- misc
#----- function to extract season teams stats -----#
prem_2020_shooting <- fb_season_team_stats(country = "ENG", gender = "M", season_end_year = "2020", tier = "1st", stat_type = "shooting")
dplyr::glimpse(prem_2020_shooting)
#----- to get shooting stats for the English Championship: -----#
# championship_2020_shooting <- get_season_team_stats(country = "ENG", gender = "M", season_end_year = "2020", tier = "2nd", stat_type = "shooting")
#----- Can also run this for multiple leagues at a time: -----#
# multiple_2020_shooting <- get_season_team_stats(country = c("USA", "NED"),
# gender = "M", season_end_year = 2020,
# tier = "1st", stat_type = "shooting")
The Big 5 Euro Leagues
The fb_big5_advanced_season_stats()
function allows
users to extract data for any of the below listed stat types for all
teams of the big five European leagues (EPL, La Liga, Ligue 1, Serie A,
Bundesliga).
The stat types available for this function are below:
- standard
- shooting
- passing
- passing_types
- gca
- defense
- possession
- playing_time
- misc
- keepers
- keepers_adv
The function also accepts a season or seasons and whether you want data for the player, or team.
Note that when selecting team_or_player="team"
, results
will be returned for both the team’s for and against stats. To filter on
this, use the Team_or_Opponent
column in the resulting data
frame, selecting ‘team’ if you want the team’s for stats, or ‘opponent’
if you want the team’s against stats.
#----- Get data for big five leagues for TEAMS -----#
big5_team_shooting <- fb_big5_advanced_season_stats(season_end_year= c(2019:2021), stat_type= "shooting", team_or_player= "team")
dplyr::glimpse(big5_team_shooting)
#----- Get data for big five leagues for PLAYERS -----#
big5_player_shooting <- fb_big5_advanced_season_stats(season_end_year= c(2019:2021), stat_type= "shooting", team_or_player= "player")
dplyr::glimpse(big5_player_shooting)
Get Season Player Stats
The fb_league_stats
function allows the user to return a
data frame of different stat types for all players in leagues here.
Note, some stats may not be available for all leagues.
The following stat types can be selected:
- standard
- shooting
- passing
- passing_types
- gca
- defense
- possession
- playing_time
- misc
- keepers
- keepers_adv
prem_2020_player_shooting <- fb_league_stats(
country = "ENG",
gender = "M",
season_end_year = 2020,
tier = "1st",
non_dom_league_url = NA,
stat_type = "shooting",
team_or_player = "player"
)
dplyr::glimpse(prem_2020_player_shooting)
fb_league_stats
retrieves data in one page load,
e.g. from here.
This is much more efficient than iterating over a vector of player URLs
to get data for all players in a league (i.e. with
fb_player_season_stats()
).
Note that the same function can be used to retrieve team data for a
season in a manner similar to fb_season_team_stats()
.
prem_2020_team_shooting <- fb_league_stats(
country = "ENG",
gender = "M",
season_end_year = 2020,
tier = "1st",
non_dom_league_url = NA,
stat_type = "shooting",
team_or_player = "team"
)
dplyr::glimpse(prem_2020_team_shooting)
Multiple values for country
, gender
,
season_end_year
, and tier
can be provided at
the same time, although only one value should be specified for
stat_type
and team_or_player
.
# fb_league_stats(
# country = c("ENG", "ESP")
# gender = c("M", "F")
# season_end_year = 2020:2021,
# tier = c("1st", "2nd"),
# non_dom_league_url = NA,
# stat_type = "shooting",
# team_or_player = "player"
# )
Match-Level Data
The following sections outlines the functions available to extract data at the per-match level
Get match results
To get the match results (and additional metadata) for all leagues and comps listed here, the following function can be used:
# function to extract Serie A match results data
serieA_2020 <- fb_match_results(country = "ITA", gender = "M", season_end_year = 2020, tier = "1st")
dplyr::glimpse(serieA_2020)
The function can also be used to return match URLs for a non-domestic
league season. To use this functionality, simply leave
country = ''
and pass the non-domestic league URL, which
can be found at https://fbref.com/en/comps/
# for international friendlies:
fb_match_results(country = "", gender = "M", season_end_year = 2018, tier = "", non_dom_league_url = "https://fbref.com/en/comps/218/history/Friendlies-M-Seasons")
More than one league season
The fb_match_results()
function can be used to get data
for multiple seasons/leagues/genders/etc also:
big_5_2020_results <- fb_match_results(country = c("ENG", "ESP", "ITA", "GER", "FRA"),
gender = "M", season_end_year = 2020, tier = "1st")
Get match report
This function will return similar results to that of
fb_match_results()
, however fb_match_report()
will provide some additional information. It will also only provide it
for a single match, not the whole season:
# function to extract match report data
liv_mci_2020 <- fb_match_report(match_url = "https://fbref.com/en/matches/47880eb7/Liverpool-Manchester-City-November-10-2019-Premier-League")
dplyr::glimpse(liv_mci_2020)
Get match team stats
This function will return teams stats for a single match (Possessions, Passing Accuracy etc.)
# function to extract match team stats
liv_mci_2020_team_stats <- fb_team_match_stats(match_url = "https://fbref.com/en/matches/47880eb7/Liverpool-Manchester-City-November-10-2019-Premier-League")
dplyr::glimpse(liv_mci_2020_team_stats)
Get match summaries
This function will return the main events that occur during a match, including goals, substitutions and red/yellow cards:
# function to extract match summary data
liv_mci_2020_summary <- fb_match_summary(match_url = "https://fbref.com/en/matches/47880eb7/Liverpool-Manchester-City-November-10-2019-Premier-League")
dplyr::glimpse(liv_mci_2020_summary)
Get match lineups
This function will return a dataframe of all players listed for that match, including whether they started on the pitch, or on the bench.
From version 0.2.7, this function now also returns some summary performance data for each player that played, including their position, minutes played, goals, cards, etc.
# function to extract match lineups
liv_mci_2020_lineups <- fb_match_lineups(match_url = "https://fbref.com/en/matches/47880eb7/Liverpool-Manchester-City-November-10-2019-Premier-League")
dplyr::glimpse(liv_mci_2020_lineups)
Get shooting and shot creation events
The below function allows users to extract shooting and shot creation event data for a match or selected matches. The data returned includes who took the shot, when, with which body part and from how far away. Additionally, the player creating the chance and also the creation before this are included in the data.
#----- Get shots data for a single match played: -----#
shot_one_match <- fb_match_shooting(match_url = "https://fbref.com/en/matches/a3eb7a37/Sheffield-United-Wolverhampton-Wanderers-September-14-2020-Premier-League")
#----- Can also extract for multiple matches at a time: -----#
# test_urls_multiple <- c("https://fbref.com/en/matches/c0996cac/Bordeaux-Nantes-August-21-2020-Ligue-1",
# "https://fbref.com/en/matches/9cbccb37/Dijon-Angers-August-22-2020-Ligue-1",
# "https://fbref.com/en/matches/f96cd5a0/Lorient-Strasbourg-August-23-2020-Ligue-1")
# shot_multiple_matches <- fb_match_shooting(test_urls_multiple)
Get advanced match statistics
The fb_advanced_match_stats()
function allows the user
to return a data frame of different stat types for matches played.
Note, some stats may not be available for all leagues. The big five European leagues should have all of these stats.
The following stat types can be selected:
- summary
- passing
- passing_types
- defense
- possession
- misc
- keeper
The function can be used for either all players individually:
test_urls_multiple <- c("https://fbref.com/en/matches/c0996cac/Bordeaux-Nantes-August-21-2020-Ligue-1",
"https://fbref.com/en/matches/9cbccb37/Dijon-Angers-August-22-2020-Ligue-1")
advanced_match_stats <- fb_advanced_match_stats(match_url = test_urls_multiple, stat_type = "possession", team_or_player = "player")
dplyr::glimpse(advanced_match_stats)
Or used for the team totals for each match:
test_urls_multiple <- c("https://fbref.com/en/matches/c0996cac/Bordeaux-Nantes-August-21-2020-Ligue-1",
"https://fbref.com/en/matches/9cbccb37/Dijon-Angers-August-22-2020-Ligue-1")
advanced_match_stats_team <- fb_advanced_match_stats(match_url = test_urls_multiple, stat_type = "passing_types", team_or_player = "team")
dplyr::glimpse(advanced_match_stats_team)
Team-Level Data
This section will cover off the functions to get team-level data from FBref.
Match Results by team
To get all the results a team(s) has competed in for a season, the following function can be used. The resulting data frame output will include all game results, including any cup games played, and will accept either one, or many team URLs.
#----- for single teams: -----#
man_city_2021_url <- "https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats"
man_city_2021_results <- fb_team_match_results(man_city_2021_url)
dplyr::glimpse(man_city_2021_results)
#----- get all team URLs for a league: -----#
# epl_2021_team_urls <- fb_teams_urls("https://fbref.com/en/comps/9/Premier-League-Stats")
# epl_2021_team_results <- fb_team_match_results(team_url = team_urls)
Stat Logs for Season
To be able to get full match logs for all available stat types for
each team(s) season, the fb_team_match_log_stats()
function
can be used.
Available stat types are below. Note, not all of the listed stat types are available for all teams, and for teams where they are, there still may be gaps in the resulting data frame as not all competitions participated in will have the stat available:
- shooting
- keeper
- passing
- passing_types
- gca
- defense
- misc
# can do it for one team:
man_city_url <- "https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats"
man_city_logs <- fb_team_match_log_stats(team_urls = man_city_url, stat_type = "passing")
dplyr::glimpse(man_city_logs)
# or multiple teams:
urls <- c("https://fbref.com/en/squads/822bd0ba/Liverpool-Stats",
"https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats")
shooting_logs <- fb_team_match_log_stats(team_urls = urls, stat_type = "shooting")
Season goal logs
Use the fb_team_goal_logs()
function to get goal logs
“for” (the default), “against”, or “both”, for a team.
man_city_url <- "https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats"
goal_log <- function(team_urls = man_city_url, for_or_against="both")
dplyr::glimpse(goal_log)
Player-Level Data
This section will cover the functions available to aid in the extraction of player season data.
The examples provided below in a lot of cases have the actual url (player or team) passed to them, however the suite of fbref helper functions outlined in this helpers vignette could also be used.
Get Player Scouting Report
The fb_player_scouting_report()
function takes in three
inputs;
-
player_url
- the URL of the player’s main page -
pos_versus
- which can return the player’s comparison against players in their “primary”, OR “secondary” position -
league_comp_name
- the name of the competition/league season you want the player comparisons for. If no value is supplied, then this will just return for all scouting periods. This information needs to be copied exactly as the tabs appear on the player scouting page on FBref.
This results in the full scouting report for the player selected.
#----- Get scouting report for the players primary position (first position listed in fbref): -----#
messi_primary <- fb_player_scouting_report(player_url = "https://fbref.com/en/players/d70ce98e/Lionel-Messi", pos_versus = "primary")
dplyr::glimpse(messi_primary)
# TO GET THE LAST 365 DAYS REPORT:
messi_last_365 <- fb_player_scouting_report(player_url = "https://fbref.com/en/players/d70ce98e/Lionel-Messi",
pos_versus = "primary",
league_comp_name = "Last 365 Days Men's Big 5 Leagues, UCL, UEL")
# TO GET SCOUTING REPORT FOR MULTIPLE COMPS/LEAGUES:
messi_multiple <- fb_player_scouting_report(player_url = "https://fbref.com/en/players/d70ce98e/Lionel-Messi",
pos_versus = "primary",
league_comp_name = c("Last 365 Days Men's Big 5 Leagues, UCL, UEL", "2022 World Cup"))
#----- Get scouting report for the players secondary position (second position listed in fbref): -----#
messi_secondary <- fb_player_scouting_report(player_url = "https://fbref.com/en/players/d70ce98e/Lionel-Messi", pos_versus = "secondary")
dplyr::glimpse(messi_secondary)
Get Player Season Stats
The fb_player_season_stats()
function allows for the
extraction of historical season totals for selected player URLs and
stat_type.
The stat_types available for use in this function are below:
- standard
- shooting
- passing
- passing_types
- gca
- defense
- possession
- playing_time
- misc
- keeper
- keeper_adv
#----- can use for a single player: -----#
mo_shooting <- fb_player_season_stats("https://fbref.com/en/players/e342ad68/Mohamed-Salah", stat_type = 'shooting')
dplyr::glimpse(mo_shooting)
#----- Or for multiple players at a time: -----#
# multiple_playing_time <- fb_player_season_stats(player_url = c("https://fbref.com/en/players/d70ce98e/Lionel-Messi",
# "https://fbref.com/en/players/dea698d9/Cristiano-Ronaldo"),
# stat_type = "playing_time")
The Big 5 Euro League Players
The fb_big5_advanced_season_stats()
function allows
users to extract data for any of the below listed stat types for all
players of the big five European leagues (EPL, La Liga, Ligue 1, Serie
A, Bundesliga).
The stat types available for this function are below:
- standard
- shooting
- passing
- passing_types
- gca
- defense
- possession
- playing_time
- misc
- keepers
- keepers_adv
The function also accepts a season or seasons and whether you want data for the player, or team.
big5_player_possession <- fb_big5_advanced_season_stats(season_end_year= 2021, stat_type= "possession", team_or_player= "player")
dplyr::glimpse(big5_player_possession)
Player Season Statistics for a teams season
The fb_team_player_stats()
function allows users to
extract data for any of the below listed stat types for all players of
selected team(s) seasons,
The stat types available for this function are below:
- standard
- shooting
- passing
- passing_types
- gca
- defense
- possession
- playing_time
- misc
- keepers
- keepers_adv
#----- to get stats for just a single team: -----#
fleetwood_standard_stats <- fb_team_player_stats(team_urls= "https://fbref.com/en/squads/d6a369a2/Fleetwood-Town-Stats", stat_type= 'standard')
dplyr::glimpse(fleetwood_standard_stats)
#----- Can even get stats for a series of teams: -----#
# league_url <- fb_league_urls(country = "ENG", gender = "M",
# teams <- fb_teams_urls(league_url)
#
# multiple_playing_time <- fb_team_player_stats(team_urls= teams, stat_type= "playing_time")
Goal and assist logs
Use the fb_player_goal_logs()
function to get logs of
“goals” (the default), “assists”, or “both”.
jwp_url <- "https://fbref.com/en/players/3515d404/James-Ward-Prowse"
goal_log <- fb_player_goal_logs(jwp_url, goals_or_assists="both")
dplyr::glimpse(goal_log)
Player Match Logs
The fb_player_match_logs()
function allows the user to
return a data frame of the match logs of different stat types for a
player’s matches played in a season.
The following stat types can be selected, depending on the player’s position (ie a striker probably won’t have “keepers” stats):
- summary
- keepers
- passing
- passing_types
- gca
- defense
- possession
- misc
ederson_url <- "https://fbref.com/en/players/3bb7b8b4/Ederson"
ederson_summary <- fb_player_match_logs(ederson_url, season_end_year = 2021, stat_type = 'summary')
dplyr::glimpse(ederson_summary)
Player Wages
To be able to extract player wages for all players of a team from
FBref (via Capology), use the fb_squad_wages()
function
below. Notw, this can be used with multiple team_urls
.
man_city_url <- "https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats"
man_city_wages <- fb_squad_wages(team_urls = man_city_url)