Skip to contents

Overview

This package is designed to allow users to extract various world football results and player statistics from the following popular football (soccer) data sites:

Installation

You can install the CRAN version of worldfootballR with:

install.packages("worldfootballR")

You can install the released version of worldfootballR from GitHub with:

# install.packages("devtools")
devtools::install_github("JaseZiv/worldfootballR")

Usage

Package vignettes have been built to help you get started with the package.

  • For functions to extract data from FBref, see here
  • For functions to extract data from Transfermarkt, see here
  • For functions to extract data from Understat, see here
  • For functions to extract data from fotmob, see here
  • For functions to extract data for international matches from FBref, see here

This vignette will cover the functions to load scraped data from the worldfootballR_data data repository.

NOTE:

As of version 0.5.2, all FBref functions now come with a user-defined pause between page loads to address their new rate limiting. See this document for more information.


Load FBref

The following section demonstrates the different loading functions of FBref data.

Load FBref match results

To load pre-scraped match results for all years the data is available, the load_match_results() function can be used. This data is scheduled to be updated most days and a print statement will inform the user of when the data was last updated. All domestic leagues are included in the data repository.

This is the load function equivalent of fb_match_results().

eng_match_results <- load_match_results(country = "ENG", gender = c("M", "F"), season_end_year = c(2020:2022), tier = "1st")
dplyr::glimpse(eng_match_results)
#> Rows: 1,536
#> Columns: 20
#> $ Competition_Name <chr> "FA Women's Super League", "FA Women's Super League",…
#> $ Gender           <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
#> $ Country          <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG", "ENG…
#> $ Season_End_Year  <int> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
#> $ Round            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Wk               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Day              <chr> "Sat", "Sat", "Sun", "Sun", "Sun", "Sun", "Sun", "Sun…
#> $ Date             <date> 2019-09-07, 2019-09-07, 2019-09-08, 2019-09-08, 2019…
#> $ Time             <chr> "15:00", "15:00", "12:30", "14:00", "14:00", "14:30",…
#> $ Home             <chr> "Manchester City", "Bristol City", "Chelsea", "Birmin…
#> $ HomeGoals        <dbl> 1, 0, 1, 0, 0, 2, 0, 2, 1, 1, 1, 0, 2, 4, 0, 0, 0, 3,…
#> $ Home_xG          <dbl> 0.3, 0.4, 1.4, 0.7, 0.8, 2.0, 0.2, 1.2, 1.2, 0.3, 0.8…
#> $ Away             <chr> "Manchester Utd", "Brighton", "Tottenham", "Everton",…
#> $ AwayGoals        <dbl> 0, 0, 0, 1, 1, 1, 2, 0, 0, 1, 0, 1, 0, 0, 1, 2, 4, 0,…
#> $ Away_xG          <dbl> 0.6, 2.0, 0.3, 0.9, 1.2, 0.9, 2.3, 0.7, 0.3, 1.5, 0.8…
#> $ Attendance       <dbl> 31213, 3041, 24564, 873, 1445, 1795, 897, 441, 996, 1…
#> $ Venue            <chr> "Etihad Stadium", "Stoke Gifford Stadium", "Stamford …
#> $ Referee          <chr> "Rebecca Welch", "Abigail Bryne", "Jack Packman", "El…
#> $ Notes            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "…
#> $ MatchURL         <chr> "https://fbref.com/en/matches/f116cea0/Manchester-Der…

Load FBref match results for Cups and International Comps

Similarly, to load pre-scraped match results for cups and international matches in all years the data is available, the load_match_comp_results() function can be used. This data is scheduled to be updated most days and a print statement will inform the user of when the data was last updated.

The following list of competitions (comp_name) are available:

#>  [1] "AFC Asian Cup"                                               
#>  [2] "AFC Asian Cup qualification"                                 
#>  [3] "AFC Women's Asian Cup"                                       
#>  [4] "AFC Women's Asian Cup Qualification"                         
#>  [5] "Africa Cup of Nations"                                       
#>  [6] "Africa Cup of Nations qualification"                         
#>  [7] "Africa Women Cup of Nations"                                 
#>  [8] "Algarve Cup"                                                 
#>  [9] "CONCACAF Gold Cup"                                           
#> [10] "CONCACAF W Championship"                                     
#> [11] "Copa America"                                                
#> [12] "Copa América Femenina"                                       
#> [13] "Copa del Rey"                                                
#> [14] "Copa Libertadores de América"                                
#> [15] "Copa Sudamericana"                                           
#> [16] "Coppa Italia"                                                
#> [17] "Coupe de France"                                             
#> [18] "Coupe de la Ligue"                                           
#> [19] "DFB-Pokal"                                                   
#> [20] "DFB-Pokal Frauen"                                            
#> [21] "English Football League Cup"                                 
#> [22] "European Championship"                                       
#> [23] "FA Cup"                                                      
#> [24] "FIFA Confederations Cup"                                     
#> [25] "FIFA Women's World Cup"                                      
#> [26] "FIFA Women's World Cup Qualification (UEFA)"                 
#> [27] "FIFA World Cup"                                              
#> [28] "FIFA World Cup Qualification — AFC"                          
#> [29] "FIFA World Cup Qualification — CAF"                          
#> [30] "FIFA World Cup Qualification — CONCACAF"                     
#> [31] "FIFA World Cup Qualification — CONMEBOL"                     
#> [32] "FIFA World Cup Qualification — Inter-confederation play-offs"
#> [33] "FIFA World Cup Qualification — OFC"                          
#> [34] "FIFA World Cup Qualification — UEFA"                         
#> [35] "International Friendlies (M)"                                
#> [36] "International Friendlies (W)"                                
#> [37] "NWSL Challenge Cup"                                          
#> [38] "NWSL Fall Series"                                            
#> [39] "OFC Nations Cup"                                             
#> [40] "OFC Women's Nations Cup"                                     
#> [41] "Olympics – Women's Tournament"                               
#> [42] "SheBelieves Cup"                                             
#> [43] "UEFA Champions League"                                       
#> [44] "UEFA Euro Qualification"                                     
#> [45] "UEFA Europa Conference League"                               
#> [46] "UEFA Europa League"                                          
#> [47] "UEFA Europa League"                                          
#> [48] "UEFA Nations League"                                         
#> [49] "UEFA Women's Champions League"                               
#> [50] "UEFA Women's Championship"                                   
#> [51] "UEFA Women's Euro Qualification"
cups <- c("FIFA Women's World Cup","FIFA World Cup")
world_cups <- load_match_comp_results(comp_name = cups)
dplyr::glimpse(world_cups)
#> Rows: 1,232
#> Columns: 20
#> $ Competition_Name <chr> "FIFA Women's World Cup", "FIFA Women's World Cup", "…
#> $ Gender           <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F"…
#> $ Country          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Season_End_Year  <int> 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991,…
#> $ Round            <chr> "Group stage", "Group stage", "Group stage", "Group s…
#> $ Wk               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Day              <chr> "Sat", "Sun", "Sun", "Sun", "Sun", "Sun", "Tue", "Tue…
#> $ Date             <date> 1991-11-16, 1991-11-17, 1991-11-17, 1991-11-17, 1991…
#> $ Time             <chr> "20:45", "15:30", "19:45", "19:45", "19:45", "19:45",…
#> $ Home             <chr> "China PR cn", "Germany de", "Japan jp", "Chinese Tai…
#> $ HomeGoals        <dbl> 4, 4, 0, 0, 3, 2, 4, 1, 2, 0, 0, 0, 0, 0, 2, 4, 2, 0,…
#> $ Home_xG          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Away             <chr> "no Norway", "ng Nigeria", "br Brazil", "it Italy", "…
#> $ AwayGoals        <dbl> 0, 0, 1, 5, 0, 3, 0, 0, 2, 5, 8, 3, 3, 2, 0, 1, 1, 2,…
#> $ Away_xG          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ Attendance       <dbl> 65000, 14000, 14000, 11000, 14000, 14000, 12000, 1200…
#> $ Venue            <chr> "Tianhe Stadium (Neutral Site)", "Jiangmen Stadium (N…
#> $ Referee          <chr> "Salvador Imperatore Marcone", "Rafael Rodriguez", "L…
#> $ Notes            <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "…
#> $ MatchURL         <chr> "https://fbref.com/en/matches/0d9e0f26/China-PR-Norwa…

Load FBref big 5 league advanced season stats

To load pre-scraped advanced stats for the big five European leagues for either teams or players, the load_fb_big5_advanced_season_stats() can be used. This data is scheduled to be updated most days and a print statement will inform the user of when the data was last updated.

This is the load function equivalent of fb_big5_advanced_season_stats().

all_season_player <- load_fb_big5_advanced_season_stats(stat_type = "defense", team_or_player = "player")
current_season_player <- load_fb_big5_advanced_season_stats(season_end_year = 2022, stat_type = "defense", team_or_player = "player")

all_season_team <- load_fb_big5_advanced_season_stats(stat_type = "defense", team_or_player = "team")
current_season_team <- load_fb_big5_advanced_season_stats(season_end_year = 2022, stat_type = "defense", team_or_player = "team")

Load Understat

The following section demonstrates the different loading functions of Understat data.

Load League Shots

To be able to rapidly load pre-collected chooting locations for whole leagues, the load_understat_league_shots() functions is now available. Supported leagues on Understat are:

  • “EPL”
  • “La liga”
  • “Bundesliga”
  • “Serie A”
  • “Ligue 1”
  • “RFPL”

This is effectively the loading equivalent of the understat_league_season_shots() function, however rather than needing to be scraped a season at a time, this data loads rapidly for all seasons for the selected league since the 2014/15 seasons.

serie_a_shot_locations <- load_understat_league_shots(league = "Serie A")
dplyr::glimpse(serie_a_shot_locations)
#> Rows: 80,802
#> Columns: 21
#> $ league          <chr> "Serie_A", "Serie_A", "Serie_A", "Serie_A", "Serie_A",…
#> $ id              <dbl> 41439, 41444, 41445, 41446, 41451, 41452, 41461, 41462…
#> $ minute          <dbl> 2, 5, 7, 8, 19, 24, 76, 76, 0, 3, 3, 3, 5, 10, 15, 15,…
#> $ result          <chr> "MissedShots", "OwnGoal", "SavedShot", "MissedShots", …
#> $ X               <dbl> 0.845, 0.009, 0.896, 0.875, 0.813, 0.780, 0.780, 0.867…
#> $ Y               <dbl> 0.641, 0.539, 0.373, 0.661, 0.180, 0.374, 0.499, 0.520…
#> $ xG              <dbl> 0.009113184, 0.000000000, 0.019183254, 0.042726491, 0.…
#> $ player          <chr> "Maxi López", "Cristiano Biraghi", "Ezequiel Schelotto…
#> $ h_a             <chr> "h", "h", "h", "h", "h", "h", "h", "h", "a", "a", "a",…
#> $ player_id       <dbl> 1188, 1164, 3873, 1188, 1336, 3873, 716, 1188, 3848, 1…
#> $ situation       <chr> "OpenPlay", "FromCorner", "OpenPlay", "SetPiece", "Fro…
#> $ season          <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, …
#> $ shotType        <chr> "Head", "OtherBodyPart", "Head", "RightFoot", "LeftFoo…
#> $ match_id        <dbl> 5149, 5149, 5149, 5149, 5149, 5149, 5149, 5149, 5149, …
#> $ home_team       <chr> "Chievo", "Chievo", "Chievo", "Chievo", "Chievo", "Chi…
#> $ away_team       <chr> "Juventus", "Juventus", "Juventus", "Juventus", "Juven…
#> $ home_goals      <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
#> $ away_goals      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ date            <chr> "2014-08-30 17:00:00", "2014-08-30 17:00:00", "2014-08…
#> $ player_assisted <chr> "Cristiano Biraghi", NA, "Nicolas Frey", "Ezequiel Sch…
#> $ lastAction      <chr> "Cross", "CrossNotClaimed", "Cross", "Pass", "Pass", "…

Load fotmob

The following section demonstrates the different loading functions of Understat data.

Load fotmob Big 5 Match Shots

load_fotmob_match_details() returns match details dating back to the 2020/21 season (some matches may be missing) for each of the following leagues:

  • EURO: 50
  • Champions League: 42
  • Copa America: 44
  • Europa League: 73
  • Premier League: 47
    1. Bundesliga: 54
  • LaLiga: 87
  • Ligue 1: 53
  • MLS: 130
  • Serie A: 55

This is effectively the loading equivalent of the fotmob_get_match_details() function for all matches for these leagues back to the beginning of the 2020/21 season.

epl_match_details <- load_fotmob_match_details(
  country = "ENG",
  league_name = "Premier League"
)
## or
## load_fotmob_match_details(league_id = 47)
dplyr::glimpse(epl_match_details)
#> Rows: 20,719
#> Columns: 42
#> $ match_id                 <int> 3411352, 3411352, 3411352, 3411352, 3411352, …
#> $ match_round              <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", …
#> $ league_id                <int> 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 4…
#> $ league_name              <chr> "Premier League", "Premier League", "Premier …
#> $ league_round_name        <chr> "Premier League Round 1", "Premier League Rou…
#> $ parent_league_id         <int> 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 47, 4…
#> $ parent_league_season     <chr> "2022/2023", "2022/2023", "2022/2023", "2022/…
#> $ match_time_utc           <chr> "Sat, Sep 12, 2020, 11:30 UTC", "Sat, Sep 12,…
#> $ home_team_id             <int> 9879, 9879, 9879, 9879, 9879, 9879, 9879, 987…
#> $ home_team                <chr> "Fulham", "Fulham", "Fulham", "Fulham", "Fulh…
#> $ home_team_color          <chr> "#000000", "#000000", "#000000", "#000000", "…
#> $ away_team_id             <int> 9825, 9825, 9825, 9825, 9825, 9825, 9825, 982…
#> $ away_team                <chr> "Arsenal", "Arsenal", "Arsenal", "Arsenal", "…
#> $ away_team_color          <chr> "#bd0510", "#bd0510", "#bd0510", "#bd0510", "…
#> $ id                       <dbl> 2210246139, 2210246547, 2210246571, 221024658…
#> $ event_type               <chr> "AttemptSaved", "Miss", "AttemptSaved", "Goal…
#> $ team_id                  <int> 9879, 9825, 9825, 9825, 9879, 9825, 9825, 982…
#> $ player_id                <int> 149150, 207236, 94086, 169193, 240478, 339992…
#> $ player_name              <chr> "Denis Odoi", "Granit Xhaka", "Willian", "Ale…
#> $ x                        <dbl> 84.73738, 90.30000, 100.63793, 102.53448, 98.…
#> $ y                        <dbl> 18.25401, 41.42563, 33.46625, 31.71250, 23.01…
#> $ min                      <int> 7, 8, 8, 8, 10, 21, 21, 27, 37, 49, 56, 57, 6…
#> $ min_added                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ is_blocked               <lgl> FALSE, FALSE, FALSE, FALSE, TRUE, TRUE, FALSE…
#> $ is_on_target             <lgl> TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE, T…
#> $ goal_crossed_y           <dbl> 31.56000, 26.28531, 32.32250, 30.87375, 31.94…
#> $ expected_goals           <dbl> 0.0274, 0.0602, 0.7366, 0.8446, 0.0455, 0.087…
#> $ expected_goals_on_target <dbl> 0.1150, NA, 0.9631, 0.9935, NA, NA, NA, NA, 0…
#> $ shot_type                <chr> "LeftFoot", "LeftFoot", "RightFoot", "LeftFoo…
#> $ situation                <chr> "RegularPlay", "RegularPlay", "RegularPlay", …
#> $ period                   <chr> "FirstHalf", "FirstHalf", "FirstHalf", "First…
#> $ is_own_goal              <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ on_goal_shot_x           <dbl> 1.645503e+00, 2.000000e+00, 1.443783e+00, 1.8…
#> $ on_goal_shot_y           <dbl> 0.29047619, 0.03495724, 0.04246728, 0.0645502…
#> $ on_goal_shot_zoom_ratio  <dbl> 1.0000000, 0.4899745, 1.0000000, 1.0000000, 1…
#> $ first_name               <chr> "Denis", "Granit", "", "Alexandre", "Neeskens…
#> $ last_name                <chr> "Odoi", "Xhaka", "Willian", "Lacazette", "Keb…
#> $ team_color               <chr> "#000000", "#bd0510", "#bd0510", "#bd0510", "…
#> $ short_name               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ blocked_x                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ blocked_y                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ goal_crossed_z           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

## multiple leagues at once
epl_ll_match_details <- load_fotmob_match_details(league_id = c(47, 87))

Note that fotmob does not currently have match details prior to 2020/21.

load_fotmob_matches_by_date() can be used to retrieve fotmob match ids dating back to August 2017.

epl_matches <- load_fotmob_matches_by_date(
  country = "ENG",
  league_name = "Premier League"
)
dplyr::glimpse(epl_matches)
#> Rows: 2,117
#> Columns: 41
#> $ date                              <chr> "2017-08-11", "2017-08-12", "2017-08…
#> $ ccode                             <chr> "ENG", "ENG", "ENG", "ENG", "ENG", "…
#> $ id                                <int> 47, 47, 47, 47, 47, 47, 47, 47, 47, …
#> $ primary_id                        <int> 47, 47, 47, 47, 47, 47, 47, 47, 47, …
#> $ name                              <chr> "Premier League", "Premier League", …
#> $ match_id                          <int> 2522743, 2522751, 2522745, 2522746, …
#> $ match_league_id                   <int> 47, 47, 47, 47, 47, 47, 47, 47, 47, …
#> $ match_time                        <chr> "11.08.2017 20:45", "12.08.2017 13:3…
#> $ home_id                           <int> 9825, 9817, 8455, 9826, 8668, 8466, …
#> $ home_score                        <int> 4, 3, 2, 0, 1, 0, 1, 0, 0, 4, 0, 0, …
#> $ home_name                         <chr> "Arsenal", "Watford", "Chelsea", "Cr…
#> $ home_long_name                    <chr> "Arsenal", "Watford", "Chelsea", "Cr…
#> $ home_pen_score                    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ away_id                           <int> 8197, 8650, 8191, 9796, 10194, 10003…
#> $ away_score                        <int> 3, 3, 3, 3, 0, 0, 0, 2, 2, 0, 4, 2, …
#> $ away_name                         <chr> "Leicester", "Liverpool", "Burnley",…
#> $ away_long_name                    <chr> "Leicester City", "Liverpool", "Burn…
#> $ away_pen_score                    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ match_eliminated_team_id          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ match_status_id                   <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, …
#> $ match_tournament_stage            <chr> "1", "1", "1", "1", "1", "1", "1", "…
#> $ match_status_finished             <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
#> $ match_status_started              <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
#> $ match_status_cancelled            <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, F…
#> $ match_status_score_str            <chr> "4 - 3", "3 - 3", "2 - 3", "0 - 3", …
#> $ match_status_start_date_str       <chr> "Aug 11, 2017", "Aug 12, 2017", "Aug…
#> $ match_status_start_date_str_short <chr> "11. Aug.", "12. Aug.", "12. Aug.", …
#> $ short                             <chr> "FT", "FT", "FT", "FT", "FT", "FT", …
#> $ long                              <chr> "Full-Time", "Full-Time", "Full-Time…
#> $ match_status_start_time_str       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ match_status_aggregated_str       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ match_status_awarded              <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ match_time_ts                     <dbl> 1.502459e+12, 1.502519e+12, 1.502528…
#> $ match_tv                          <list> <NULL>, <NULL>, <NULL>, <NULL>, <NU…
#> $ internal_rank                     <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, …
#> $ live_rank                         <int> 101, 101, 101, 101, 101, 101, 101, 1…
#> $ simple_league                     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, F…
#> $ parent_league_id                  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ is_group                          <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ group_name                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ parent_league_name                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, …

## multiple leagues at once
epl_ll_matches <- load_fotmob_matches_by_date(league_id = c(47, 87))