Andrew Clark

Every year there is talk of how a team near the top of the table only won because they beat the teams around them or failed to prosper because they couldn’t get enough points off the teams below them (Liverpool, in particular, being a case in point the this season)

Let’s have a look at that for the history of the Premier League as far as the top teams go. This year, there have been, realistically, six teams vying for the top four places which guarantees entry into the following year’s, exclusive, Champions League. I will, thus, concentrate on how the ‘Top Sixes’ have fared against each other over the 25 years of the Premier League

Let’s load the libraries and data required. The latter has been personally accumulated over many years but similar data can also be obtained from James Curley’s engsoccerdata package

library(DT)
library(knitr)
library(plotly)
library(crosstalk)
library(htmltools)
library(tidyverse)



standings <- readRDS("data/standings.rds")

glimpse(standings)
## Observations: 19,492
## Variables: 20
## $ season        <chr> "2004/05", "2004/05", "2004/05", "2004/05", "200...
## $ final_Pos     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ team          <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Che...
## $ GF            <int> 1, 3, 1, 2, 3, 0, 1, 3, 4, 1, 3, 1, 0, 1, 3, 2, ...
## $ GA            <int> 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, ...
## $ gameDate      <date> 2005-05-15, 2005-05-10, 2005-05-07, 2005-04-30,...
## $ tmGameOrder   <int> 506, 505, 504, 503, 502, 501, 500, 499, 498, 497...
## $ tmYrGameOrder <int> 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, ...
## $ venue         <chr> "A", "A", "H", "A", "H", "H", "H", "A", "H", "H"...
## $ MATCHID       <int> 5285, 5277, 5265, 5253, 5243, 5235, 5217, 5212, ...
## $ OppTeam       <chr> "Newcastle U", "Man. Utd.", "Charlton", "Bolton"...
## $ points        <dbl> 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 3, 3, ...
## $ cumGF         <int> 72, 71, 68, 67, 65, 62, 62, 61, 58, 54, 53, 50, ...
## $ cumGA         <int> 15, 14, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, ...
## $ cumPts        <dbl> 95, 94, 91, 88, 85, 82, 81, 80, 77, 74, 71, 68, ...
## $ cumGD         <int> 57, 57, 55, 54, 52, 50, 50, 50, 48, 45, 44, 42, ...
## $ allGames      <dbl> 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, ...
## $ position      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ res           <chr> "Draw", "Win", "Win", "Win", "Win", "Draw", "Dra...
## $ tt            <chr> "<table cellpadding='4' style='line-height:1'><t...

The ‘standings’ data.frame has a row for each team for each round so, firstly, I need to construct a subset restricted to end-of-season data for the top six each year, retaining just the team, season, final position and points (3 for win, 1 for tie/draw) fields

  topSixes <- standings %>% 
    group_by(season) %>% 
     filter(tmYrGameOrder==max(tmYrGameOrder)&final_Pos<=6) %>% 
  select(team,season,final_Pos,cumPts)

# test for specific season
(topSixes %>% 
  filter(season=="2015/16"))
## # A tibble: 6 x 4
## # Groups:   season [1]
##          team  season final_Pos cumPts
##         <chr>   <chr>     <int>  <dbl>
## 1 Leicester C 2015/16         1     81
## 2     Arsenal 2015/16         2     71
## 3 Tottenham H 2015/16         3     70
## 4   Man. City 2015/16         4     66
## 5   Man. Utd. 2015/16         5     66
## 6 Southampton 2015/16         6     63

As you may recall, last year Leicester convincingly won the title and Arsenal did their perennial - up until then at least - eclipsing of local rivals, Spurs

We now want to construct a minitable for the encounters between these clubs. It usually pays to look at one season first and then extend to all years, using the mapping functions from the purrr package (part of the tidyverse package loaded above)

# top teams for 2016/17
oppsOneYear <- topSixes %>% 
    filter(season=="2016/17") %>% 
    .$team


#function to create results for club
  make_OneYeartable <- function(x) {
   
    temp <- standings %>% 
      filter(season=="2016/17"&team==x&OppTeam %in% oppsOneYear) %>% 
      summarize(totPoints=sum(points),totGF=sum(GF),totGA=sum(GA),totGD=totGF-totGA)
    
    cbind(team=x,temp)
    
  }
 
  ## apply the function to each of the six teams. Use map_df as we want a data.frame as outcome
  dataOneYear <-map_df(oppsOneYear, make_OneYeartable)

  
dataOneYear
##          team totPoints totGF totGA totGD
## 1     Chelsea        16    16    14     2
## 2 Tottenham H        15    13    10     3
## 3   Man. City        10    11    15    -4
## 4   Liverpool        20    16     9     7
## 5     Arsenal         9    15    18    -3
## 6   Man. Utd.        10     7    12    -5

Arsenal’s relative failure against the top clubs cost them a place in the Champions League as they finished just one point shy of fourth place

OK, now we can look at applying the process to all seasons. This requires use of the map2_df() function

make_table <- function(x,y) {
    
  # need to create a separate top 6 for each season
    opps <- topSixes %>% 
      filter(season==x) %>% 
      .$team
    
    temp <- standings %>% 
      filter(season==x&team==y&OppTeam %in% opps) %>%  #10 games
      summarize(totPoints=sum(points),totGF=sum(GF),totGA=sum(GA),totGD=totGF-totGA)
    
    # add both season and team to the data
    cbind(season=x,team=y,temp)
    
  } 
  
# use  map2 is specialised to iterate over two arguments
  data <-map2_df(topSixes$season,topSixes$team, make_table) 
  
  # add back in the relevant full table data (still needs bit of work)
  
  data <- data %>% 
    left_join(topSixes) %>% 
    # create minitable position
    arrange(desc(totPoints),desc(totGD),desc(totGF),team) %>% 
    group_by(season) %>% 
    mutate(pos=row_number()) %>% 
    ungroup() %>% 
    mutate(other_pts=cumPts-totPoints) %>% 
  select(season,team,pos,points=totPoints,GD=totGD,other_pts,all_pos=final_Pos,all_pts=cumPts) %>% 
    arrange(season,pos)
  
 
  head(data)
## # A tibble: 6 x 8
##    season        team   pos points    GD other_pts all_pos all_pts
##     <chr>       <chr> <int>  <dbl> <int>     <dbl>   <int>   <dbl>
## 1 1992/93   Man. Utd.     1     19     7        65       1      84
## 2 1992/93   Blackburn     2     18    13        53       4      71
## 3 1992/93 Aston Villa     3     14     0        60       2      74
## 4 1992/93   Liverpool     4     13    -2        46       6      59
## 5 1992/93   Norwich C     5     13   -10        59       3      72
## 6 1992/93         QPR     6      7    -8        56       5      63

In 1992/93, Year One of the Premier League, Blackburn and Liverpool (sound familiar) suffered by not taking advantage of lesser teams


One piece of trivia for this season is that Manchester United have only managed one goal away from home against others in the top six and that was a consolation goal in the last of them, against Spurs

teams <- topSixes %>% 
  filter(season=="2016/17") %>% 
  .$team

# mini1 <- standings %>%
#   filter(team=="Man. Utd." & season=="2016/17" & OppTeam %in% teams) %>% 
#   arrange(gameDate) %>% 
#   mutate(result=paste0(GF,"-",GA)) %>% 
#   select(Opponent=OppTeam,date=gameDate,venue,result) %>%
#                          DT::datatable(width=400,class='compact stripe hover row-border order-column',rownames=FALSE,options= list(paging = FALSE, searching = FALSE,info=FALSE))
# 
# 
# ## not working as in other files?? - though is in system
# #htmlwidgets::saveWidget(mini1, file = "mini1.html", selfcontained = TRUE)

 standings %>%
  filter(team=="Man. Utd." & season=="2016/17" & OppTeam %in% teams) %>% 
  arrange(gameDate) %>% 
  mutate(result=paste0(GF,"-",GA)) %>% 
  select(Opponent=OppTeam,date=gameDate,venue,result) %>% 
kable()
Opponent date venue result
Man. City 2016-09-10 H 1-2
Liverpool 2016-10-17 A 0-0
Chelsea 2016-10-23 A 0-4
Arsenal 2016-11-19 H 1-1
Tottenham H 2016-12-11 H 1-0
Liverpool 2017-01-15 H 1-1
Chelsea 2017-04-16 H 2-0
Man. City 2017-04-27 A 0-0
Arsenal 2017-05-07 A 0-2
Tottenham H 2017-05-14 A 1-2

The chart below shows results for each year. Hover points for team and use the filter created using the crosstalk package to identify just one team The points are jittered to account for situation where two teams have same number of points

sd <- SharedData$new(data)




fs <- filter_select(
id = "team",
label = "Select Team",
sharedData = sd,
group =  ~ team,
allLevels = FALSE,
multiple = FALSE
)


## this is needed as crosstalk does not work nicely with bootstrap, apparently
fs_nobootstrap <- fs

attr(fs_nobootstrap, "html_dependencies") <- Filter(
  function(dep) {dep$name != "bootstrap"},
  attr(fs_nobootstrap, "html_dependencies")
)




myChart <- sd %>% 
  plot_ly(x=~jitter(points),y=~season,color=~as.factor(pos), height=700) %>% 
  add_markers(hoverinfo="text",
              text=~paste0(team,
                           "<br>Pos: ",pos,
                           "<br>Pts: ",points)) %>% 
  layout(title="Performance against other top 6 teams by season",
         xaxis=list(title="Points Accrued"),
         yaxis=list(title="")) %>%
  config(displayModeBar = F,showLink = F)


## combine the selector and chart

 
  tagList(
  fs_nobootstrap,
  br(),
   myChart
)

Just to reinforce how disappointing this season has been - at least as far as the league is concerned - Manchester United’s points total of 10 is the joint worst they have ever registered in the PL with 2001/2. That year, they actually lost 6 of 10 againts their closest rivals. In that season, 36 yr old Laurent Blanc was a mainstay of their defence and Juan Veron patrolled the midfield

comments powered by Disqus