Skip to contents

Overview

This package is designed to allow users to extract various world football results and player statistics from the following popular football (soccer) data sites:

Installation

As at 2024-06-29, we are no longer including instructions to install from CRAN. The version pushed to CRAN is very much out of date, and with very regular updates to this library, we advise installing from GitHub only.

You can install the released version of worldfootballR from GitHub with:

# install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")

Usage

Package vignettes have been built to help you get started with the package.

  • For functions to extract data from FBref, see here
  • For functions to extract data from Transfermarkt, see here
  • For functions to extract data for international matches from FBref, see here
  • For functions to load pre-scraped data, see here

This vignette will cover the functions to extract data from understat.com


Understat Helper Functions

Team Names

To get a list of all available teams names team selected league, use the understat_avalaible_teams() function.

You can pass the results of the understat_avalaible_teams() function execution to the understat_team_meta() function.

team_names <- understat_team_meta(team_name = understat_avalaible_teams(league = 'EPL'))

Team URLs

To get a list of all season team URLs for selected teams, use the understat_team_meta() function (note, to get team names, it might be advisable to look at Understat.com’s spelling of the team names and pass that through to the function):

team_urls <- understat_team_meta(team_name = c("Liverpool", "Manchester City"))

League Season-Level Data

This section will cover the functions to aid in the extraction of season league statistics from Understat.

The following leagues are currently supported by Understat (these values can be passed in to the league arguments of most understat_ functions):

  • “EPL”
  • “La liga”
  • “Bundesliga”
  • “Serie A”
  • “Ligue 1”
  • “RFPL”

Match Results

To be able to extract match results from Understat, which not only have results and expected goals, but they also provide a probability of a team winning.

To extract the data, use the understat_league_match_results() function:

# to get the EPL results:
epl_results <- understat_league_match_results(league = "EPL", season_start_year = 2020)
dplyr::glimpse(epl_results)
#> Rows: 380
#> Columns: 18
#> $ league        <chr> "EPL", "EPL", "EPL", "EPL", "EPL", "EPL", "EPL", "EPL", …
#> $ season        <chr> "2020/2021", "2020/2021", "2020/2021", "2020/2021", "202…
#> $ match_id      <chr> "14086", "14087", "14090", "14091", "14092", "14093", "1…
#> $ isResult      <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TR…
#> $ home_id       <chr> "228", "78", "87", "81", "76", "82", "238", "220", "72",…
#> $ home_team     <chr> "Fulham", "Crystal Palace", "Liverpool", "West Ham", "We…
#> $ home_abbr     <chr> "FLH", "CRY", "LIV", "WHU", "WBA", "TOT", "SHE", "BRI", …
#> $ away_id       <chr> "83", "74", "245", "86", "75", "72", "229", "80", "76", …
#> $ away_team     <chr> "Arsenal", "Southampton", "Leeds", "Newcastle United", "…
#> $ away_abbr     <chr> "ARS", "SOU", "LED", "NEW", "LEI", "EVE", "WOL", "CHE", …
#> $ home_goals    <dbl> 0, 1, 4, 0, 0, 0, 0, 1, 5, 4, 1, 2, 2, 0, 0, 4, 1, 1, 2,…
#> $ away_goals    <dbl> 3, 0, 3, 2, 3, 1, 2, 3, 2, 3, 3, 1, 5, 3, 2, 2, 0, 3, 3,…
#> $ home_xG       <dbl> 0.126327, 1.395690, 3.154120, 0.861445, 0.352997, 0.8229…
#> $ away_xG       <dbl> 2.162870, 1.262670, 0.269813, 1.659110, 2.955810, 1.2679…
#> $ datetime      <chr> "2020-09-12 11:30:00", "2020-09-12 14:00:00", "2020-09-1…
#> $ forecast_win  <dbl> 0.0037, 0.3916, 0.9658, 0.1506, 0.0070, 0.2200, 0.1683, …
#> $ forecast_draw <dbl> 0.0476, 0.3022, 0.0296, 0.2480, 0.0358, 0.2977, 0.2906, …
#> $ forecast_loss <dbl> 0.9487, 0.3062, 0.0046, 0.6014, 0.9572, 0.4823, 0.5411, …

Season Shooting locations

To get shooting locations for a whole season in supported leagues, use the understat_league_season_shots() function:

ligue1_shot_location <- understat_league_season_shots(league = "Ligue 1", season_start_year = 2020)

Match-Level Data

The following sections outlines the functions available to extract data at the per-match level

Match Shooting Locations

To get shooting locations for an individual match, use the understat_match_shots() function:

wba_liv_shots <- understat_match_shots(match_url = "https://understat.com/match/14789")
dplyr::glimpse(wba_liv_shots)
#> Rows: 36
#> Columns: 20
#> $ id              <chr> "422440", "422441", "422442", "422450", "422456", "422…
#> $ minute          <dbl> 9, 11, 14, 35, 46, 47, 50, 61, 70, 77, 2, 3, 5, 23, 26…
#> $ result          <chr> "MissedShots", "MissedShots", "Goal", "BlockedShot", "…
#> $ X               <dbl> 0.869, 0.965, 0.881, 0.883, 0.957, 0.712, 0.767, 0.942…
#> $ Y               <dbl> 0.441, 0.460, 0.356, 0.336, 0.590, 0.403, 0.590, 0.626…
#> $ xG              <dbl> 0.0313527, 0.1447450, 0.2382660, 0.2825390, 0.0260821,…
#> $ player          <chr> "Semi Ajayi", "Okay Yokuslu", "Hal Robson-Kanu", "Hal …
#> $ home_away       <chr> "h", "h", "h", "h", "h", "h", "h", "h", "h", "h", "a",…
#> $ player_id       <chr> "4490", "6932", "1738", "1738", "964", "7153", "7153",…
#> $ situation       <chr> "SetPiece", "SetPiece", "OpenPlay", "OpenPlay", "FromC…
#> $ season          <chr> "2020", "2020", "2020", "2020", "2020", "2020", "2020"…
#> $ shotType        <chr> "Head", "Head", "LeftFoot", "LeftFoot", "Head", "LeftF…
#> $ match_id        <chr> "14789", "14789", "14789", "14789", "14789", "14789", …
#> $ home_team       <chr> "West Bromwich Albion", "West Bromwich Albion", "West …
#> $ away_team       <chr> "Liverpool", "Liverpool", "Liverpool", "Liverpool", "L…
#> $ home_goals      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ away_goals      <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
#> $ date            <chr> "2021-05-16 15:30:00", "2021-05-16 15:30:00", "2021-05…
#> $ player_assisted <chr> "Matheus Pereira", "Darnell Furlong", "Matheus Pereira…
#> $ lastAction      <chr> "Cross", "Chipped", "Pass", "HeadPass", "Aerial", "Sta…

Match Stats

To get the data from the stats table for an individual match, use the understat_match_stats() function:

wba_liv_stats <- understat_match_stats(match_url = "https://understat.com/match/14789")
dplyr::glimpse(wba_liv_stats)
#> Rows: 1
#> Columns: 20
#> $ match_id            <int> 14789
#> $ home_team           <chr> "West Bromwich Albion"
#> $ home_chances        <dbl> 0.18
#> $ home_goals          <int> 1
#> $ home_xG             <dbl> 1.14
#> $ home_shots          <int> 10
#> $ home_shot_on_target <int> 3
#> $ home_deep           <int> 3
#> $ home_PPDA           <dbl> 21.86
#> $ home_xPTS           <dbl> 0.76
#> $ draw_chances        <dbl> 0.22
#> $ away_team           <chr> "Liverpool"
#> $ away_chances        <dbl> 0.6
#> $ away_goals          <int> 2
#> $ away_xG             <dbl> 2.08
#> $ away_shots          <int> 26
#> $ away_shot_on_target <int> 6
#> $ away_deep           <int> 20
#> $ away_PPDA           <dbl> 4.05
#> $ away_xPTS           <dbl> 2.01

Match Players

To get the data for player in an individual match, use the understat_match_players() function:

wba_liv_players <- understat_match_players(match_url = "https://understat.com/match/14789")
dplyr::glimpse(wba_liv_players)
#> Rows: 27
#> Columns: 23
#> $ match_id      <int> 14789, 14789, 14789, 14789, 14789, 14789, 14789, 14789, …
#> $ id            <int> 471471, 471472, 471474, 471473, 471475, 471476, 471477, …
#> $ team_id       <int> 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, …
#> $ home_away     <chr> "h", "h", "h", "h", "h", "h", "h", "h", "h", "h", "h", "…
#> $ player_id     <int> 978, 4391, 964, 4490, 8905, 1737, 6932, 9040, 6651, 7153…
#> $ swap_id       <int> 471471, 471472, 471474, 471473, 471475, 471476, 471477, …
#> $ player        <chr> "Sam Johnstone", "Darnell Furlong", "Kyle Bartley", "Sem…
#> $ position      <chr> "GK", "DR", "DC", "DC", "DL", "MR", "MC", "MC", "ML", "A…
#> $ positionOrder <int> 1, 2, 3, 3, 4, 8, 9, 9, 10, 12, 15, 17, 17, 17, 1, 2, 3,…
#> $ time_played   <int> 90, 90, 90, 90, 90, 90, 80, 90, 78, 90, 87, 12, 10, 3, 9…
#> $ goals         <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0,…
#> $ own_goals     <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ shots         <int> 0, 1, 1, 2, 0, 0, 1, 0, 0, 2, 3, 0, 0, 0, 1, 4, 2, 1, 0,…
#> $ xG            <dbl> 0.0000000, 0.0132741, 0.0260821, 0.0580258, 0.0000000, 0…
#> $ yellow_card   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ red_card      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ roster_in     <int> 0, 0, 0, 0, 0, 0, 471484, 0, 471483, 0, 471482, 0, 0, 0,…
#> $ roster_out    <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 471479, 471477, 471481,…
#> $ key_passes    <int> 0, 2, 0, 1, 0, 0, 0, 1, 0, 4, 1, 0, 0, 0, 0, 5, 1, 0, 2,…
#> $ assists       <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,…
#> $ xA            <dbl> 0.0000000, 0.4272840, 0.0000000, 0.2957020, 0.0000000, 0…
#> $ xGChain       <dbl> 0.2825390, 0.8165060, 0.0000000, 0.5339680, 0.2957020, 0…
#> $ xGBuildup     <dbl> 0.2825390, 0.5339680, 0.0000000, 0.2382660, 0.2957020, 0…

Team Data

This section will cover off the functions to get team-level data from Transfermarkt.

Team Shooting Locations

To get all shots taken and conceded by a team during a season, use the understat_team_season_shots() function:

# for one team:
man_city_shots <- understat_team_season_shots(team_url = "https://understat.com/team/Manchester_City/2020")
dplyr::glimpse(man_city_shots)
#> Rows: 886
#> Columns: 20
#> $ id              <chr> "378528", "378533", "378537", "378538", "378539", "378…
#> $ minute          <dbl> 15, 40, 53, 55, 58, 59, 64, 73, 77, 86, 7, 10, 19, 29,…
#> $ result          <chr> "BlockedShot", "MissedShots", "MissedShots", "BlockedS…
#> $ X               <dbl> 0.789, 0.892, 0.860, 0.811, 0.822, 0.886, 0.869, 0.803…
#> $ Y               <dbl> 0.564, 0.409, 0.501, 0.496, 0.398, 0.473, 0.259, 0.467…
#> $ xG              <dbl> 0.03422860, 0.03680430, 0.10313500, 0.05339760, 0.0860…
#> $ player          <chr> "Pedro Neto", "Raúl Jiménez", "Daniel Podence", "Rúben…
#> $ home_away       <chr> "h", "h", "h", "h", "h", "h", "h", "h", "h", "h", "a",…
#> $ player_id       <chr> "6382", "4105", "8291", "6853", "8291", "4105", "6853"…
#> $ situation       <chr> "OpenPlay", "FromCorner", "OpenPlay", "OpenPlay", "Ope…
#> $ season          <chr> "2020", "2020", "2020", "2020", "2020", "2020", "2020"…
#> $ shotType        <chr> "LeftFoot", "Head", "LeftFoot", "LeftFoot", "RightFoot…
#> $ match_id        <chr> "14105", "14105", "14105", "14105", "14105", "14105", …
#> $ home_team       <chr> "Wolverhampton Wanderers", "Wolverhampton Wanderers", …
#> $ away_team       <chr> "Manchester City", "Manchester City", "Manchester City…
#> $ home_goals      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ away_goals      <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
#> $ date            <chr> "2020-09-21 19:15:00", "2020-09-21 19:15:00", "2020-09…
#> $ player_assisted <chr> "Daniel Podence", "Adama Traoré", "Adama Traoré", "Ped…
#> $ lastAction      <chr> "Pass", "Cross", "Pass", "Pass", "Chipped", "Cross", "…

Team Stat Breakdowns

To get a more granular breakdown of team shooting data for whole seasons, the understat_team_stats_breakdown() function can be used. This functions returns a breakdown of team shooting data based on the following groupings:

  • Situation
  • Formation
  • Game state
  • Timing
  • Shot zones
  • Attack speed
  • Result
#----- Can get data for single teams at a time: -----#
team_breakdown <- understat_team_stats_breakdown(team_urls = "https://understat.com/team/Liverpool/2020")
dplyr::glimpse(team_breakdown)
#> Rows: 34
#> Columns: 11
#> $ team_name         <chr> "Liverpool", "Liverpool", "Liverpool", "Liverpool", …
#> $ season_start_year <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020…
#> $ stat_group_name   <chr> "situation", "situation", "situation", "situation", …
#> $ stat_name         <chr> "OpenPlay", "FromCorner", "SetPiece", "DirectFreekic…
#> $ shots             <int> 466, 94, 23, 22, 6, 532, 33, 31, 13, 2, 302, 135, 10…
#> $ goals             <int> 49, 11, 2, 0, 6, 59, 6, 1, 2, 0, 32, 15, 8, 11, 2, 6…
#> $ xG                <dbl> 59.4529171, 9.1182853, 1.8527929, 1.3437825, 4.56701…
#> $ against.shots     <int> 252, 40, 21, 12, 8, 296, 20, 9, 7, 1, 161, 80, 45, 3…
#> $ against.goals     <int> 28, 6, 3, 1, 4, 38, 2, 1, 0, 1, 17, 12, 5, 3, 5, 8, …
#> $ against.xG        <dbl> 33.1091621, 4.2281575, 3.9210222, 0.6303305, 6.08935…
#> $ time              <int> NA, NA, NA, NA, NA, 3147, 216, 134, 81, 12, 1914, 73…

#----- Or for multiple teams: -----#
# team_urls <- c("https://understat.com/team/Liverpool/2020",
#                "https://understat.com/team/Manchester_City/2020")
# team_breakdown <- understat_team_stats_breakdown(team_urls = team_urls)

Player Data

This section will cover the functions available to aid in the extraction of player data.

Player Shooting Locations

To get shooting locations for all games a player has participated in (for as long as Understat has data for), use the understat_player_shots() function:

raheem_sterling_shots <- understat_player_shots(player_url = "https://understat.com/player/618")
dplyr::glimpse(raheem_sterling_shots)
#> Rows: 686
#> Columns: 20
#> $ id              <chr> "14490", "14491", "14496", "14497", "14779", "15104", …
#> $ minute          <dbl> 20, 22, 47, 53, 8, 7, 69, 74, 65, 81, 19, 25, 47, 50, …
#> $ result          <chr> "SavedShot", "Goal", "SavedShot", "MissedShots", "Miss…
#> $ X               <dbl> 0.853, 0.856, 0.816, 0.745, 0.857, 0.959, 0.940, 0.968…
#> $ Y               <dbl> 0.695, 0.496, 0.377, 0.443, 0.470, 0.615, 0.524, 0.646…
#> $ xG              <dbl> 0.0407033, 0.3114090, 0.0576012, 0.0254811, 0.0726696,…
#> $ player          <chr> "Raheem Sterling", "Raheem Sterling", "Raheem Sterling…
#> $ home_away       <chr> "h", "h", "h", "h", "a", "a", "a", "a", "h", "h", "a",…
#> $ player_id       <chr> "618", "618", "618", "618", "618", "618", "618", "618"…
#> $ situation       <chr> "OpenPlay", "OpenPlay", "OpenPlay", "OpenPlay", "OpenP…
#> $ season          <chr> "2014", "2014", "2014", "2014", "2014", "2014", "2014"…
#> $ shotType        <chr> "LeftFoot", "RightFoot", "RightFoot", "RightFoot", "Ri…
#> $ match_id        <chr> "4756", "4756", "4756", "4756", "4768", "4777", "4777"…
#> $ home_team       <chr> "Liverpool", "Liverpool", "Liverpool", "Liverpool", "M…
#> $ away_team       <chr> "Southampton", "Southampton", "Southampton", "Southamp…
#> $ home_goals      <dbl> 2, 2, 2, 2, 3, 0, 0, 0, 0, 0, 3, 3, 3, 3, 1, 1, 1, 1, …
#> $ away_goals      <dbl> 1, 1, 1, 1, 1, 3, 3, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ date            <chr> "2014-08-17 13:30:00", "2014-08-17 13:30:00", "2014-08…
#> $ player_assisted <chr> "Philippe Coutinho", "Jordan Henderson", "Jordan Hende…
#> $ lastAction      <chr> "Pass", "Throughball", "Pass", "Pass", "Chipped", "Pas…

Team Player Season Stats

To get stats for all players of selected teams, run the understat_team_players_stats() function.

Note: Team URLs cal be extracted using understat_team_meta().

team_players <- understat_team_players_stats(team_url = c("https://understat.com/team/Liverpool/2020", "https://understat.com/team/Manchester_City/2020"))
dplyr::glimpse(team_players)
#> Rows: 52
#> Columns: 19
#> $ season       <chr> "2020/2021", "2020/2021", "2020/2021", "2020/2021", "2020…
#> $ player_id    <dbl> 1250, 838, 482, 6854, 771, 1791, 229, 332, 605, 833, 966,…
#> $ player_name  <chr> "Mohamed Salah", "Sadio Mané", "Roberto Firmino", "Diogo …
#> $ games        <dbl> 37, 35, 36, 19, 38, 36, 24, 10, 21, 5, 13, 33, 38, 24, 17…
#> $ time         <dbl> 3085, 2805, 2882, 1114, 2961, 3040, 1865, 701, 1710, 370,…
#> $ goals        <dbl> 22, 11, 9, 9, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0…
#> $ xG           <dbl> 20.2508505, 14.8285516, 12.8602165, 7.0577230, 2.8174270,…
#> $ assists      <dbl> 5, 7, 7, 0, 0, 7, 0, 2, 1, 0, 1, 0, 7, 2, 1, 0, 0, 1, 0, …
#> $ xA           <dbl> 6.5285276, 7.7877541, 6.1168645, 1.7625196, 1.6629221, 8.…
#> $ shots        <dbl> 126, 94, 83, 46, 31, 55, 22, 5, 14, 4, 8, 1, 19, 19, 15, …
#> $ key_passes   <dbl> 55, 61, 44, 12, 21, 77, 30, 3, 14, 0, 2, 0, 65, 12, 7, 0,…
#> $ yellow_cards <dbl> 0, 3, 2, 2, 1, 2, 4, 2, 0, 1, 0, 1, 2, 2, 2, 0, 0, 3, 0, …
#> $ red_cards    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ position     <chr> "F M S", "F M S", "F M S", "F M S", "M S", "D S", "M S", …
#> $ team_name    <chr> "Liverpool", "Liverpool", "Liverpool", "Liverpool", "Live…
#> $ npg          <dbl> 16, 11, 9, 9, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0…
#> $ npxG         <dbl> 15.6838341, 14.8285516, 12.8602165, 7.0577230, 2.8174270,…
#> $ xGChain      <dbl> 28.9682294, 24.9989162, 25.2714681, 10.9729662, 13.922178…
#> $ xGBuildup    <dbl> 9.8002365, 6.0576597, 10.1985496, 4.0760983, 10.4762759, …