RcppRoll package for swift moving averages

Bob Rudis always keeps his pulse on the best packages and recently bemoaned the fact that Kevin Ushey’s RcppRoll package which

Provides Routines for the efficient computation of windowed mean, median,
sum, product, minimum, maximum, standard deviation and variance are provided.

has not had enough praise

One reason is probably the lack of examples available - even in the vignette

I have therefore done a swift illustration using just one of the functions, roll_sum()

Best Runs in the Premier league

Tottenham Hotspur recently went on a tear of nine consecutive victories, ended recently by West Ham and thereby pretty well scuppering their remote chances of the league title

Let’s do a quick check to see if the past 10 games remains their most productive in terms of points gained (which, for the uninitated is 3 for a win, 1 for a draw/tie and zero for a loss) since 1992, when the Premier League was formed

N.B Initial analysis based on data to 4th May 2017 but underlying data is updated periodically

Here are libraries used

library(RcppRoll)
library(plotly)
library(crosstalk)
library(purrr)
library(htmltools)
library(tidyverse)

I maintain a ‘standings’ file which contains the raw data required

standings <- readRDS("data/standings.rds")

glimpse(standings)
## Observations: 19,492
## Variables: 20
## $ season        <chr> "2004/05", "2004/05", "2004/05", "2004/05", "200...
## $ final_Pos     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ team          <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Che...
## $ GF            <int> 1, 3, 1, 2, 3, 0, 1, 3, 4, 1, 3, 1, 0, 1, 3, 2, ...
## $ GA            <int> 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, ...
## $ gameDate      <date> 2005-05-15, 2005-05-10, 2005-05-07, 2005-04-30,...
## $ tmGameOrder   <int> 506, 505, 504, 503, 502, 501, 500, 499, 498, 497...
## $ tmYrGameOrder <int> 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, ...
## $ venue         <chr> "A", "A", "H", "A", "H", "H", "H", "A", "H", "H"...
## $ MATCHID       <int> 5285, 5277, 5265, 5253, 5243, 5235, 5217, 5212, ...
## $ OppTeam       <chr> "Newcastle U", "Man. Utd.", "Charlton", "Bolton"...
## $ points        <dbl> 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 3, 3, ...
## $ cumGF         <int> 72, 71, 68, 67, 65, 62, 62, 61, 58, 54, 53, 50, ...
## $ cumGA         <int> 15, 14, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, ...
## $ cumPts        <dbl> 95, 94, 91, 88, 85, 82, 81, 80, 77, 74, 71, 68, ...
## $ cumGD         <int> 57, 57, 55, 54, 52, 50, 50, 50, 48, 45, 44, 42, ...
## $ allGames      <dbl> 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, ...
## $ position      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ res           <chr> "Draw", "Win", "Win", "Win", "Win", "Draw", "Dra...
## $ tt            <chr> "<table cellpadding='4' style='line-height:1'><t...

team, points and tmGameOrder are the fields required

We can now select Tottenham’s games, use the roll_sum() function to get a rolling total of points secured in ten consecutive games and plot the results in an infoactive chart

# select team and ensure data is in correct order
tm <- standings %>% 
    filter(team=="Tottenham H") %>% 
    arrange(tmGameOrder)
 
   # construct data.frame of results
   run <- roll_sum(tm$points,n=10)
   df <- data.frame(points=run,seq=1:length(run))
  
   # produce chart
    df %>% 
    plot_ly(x=~seq,y=~points) %>% 
    add_lines()

As usual with a plotly chart, you can zoom and hover. I have not bothered with producing the best aesthetic!

Although this is the first time they have had a sequence of nine consecutive wins, the fall of 2011 included a run of 11 games with 10 wins and a draw


Let’s extend this to all teams - the code only takes a couple of seconds to create a 20,000 row data.frame

# get vector of all 47 teams that have appeared in Premier League history
teams <- unique(standings$team)



get_runs <- function(x) {
    tm <-standings %>% 
    filter(team==x) %>% 
    arrange(tmGameOrder)
 
   # construct data.frame of results
   run <- roll_sum(tm$points,n=10)
   data.frame(points=run,seq=1:length(run),team=x)


}

# apply the above function to all teams
data <-map_df(teams, get_runs)

glimpse(data)
## Observations: 19,069
## Variables: 3
## $ points <dbl> 13, 12, 15, 17, 16, 16, 16, 19, 22, 20, 20, 21, 19, 16,...
## $ seq    <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ team   <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Chelsea", ...

We can now use this data in a couple of ways. Lets use the filter functions from the crosstalk package to select any team. I covered this in a bit more depth here

sd <- SharedData$new(data)

fs <- filter_select(
id = "team1",
label = "Select Team",
sharedData = sd,
group =  ~ team,
allLevels = FALSE,
multiple = FALSE
)


## this is needed as crosstalk does not work nicely with bootstrap, apparently
fs_nobootstrap <- fs

attr(fs_nobootstrap, "html_dependencies") <- Filter(
  function(dep) {dep$name != "bootstrap"},
  attr(fs_nobootstrap, "html_dependencies")
)



  
   # produce chart
myChart  <-  sd %>% 
    plot_ly(x=~seq,y=~points) %>% 
    add_lines()


  tagList(
  fs_nobootstrap,
  br(),
   myChart
)

Currently, the filter_select does not appear to default to a single, selected value - though I believe that is being addressed. Anyways, it is a simple enough matter to select a team


How about the best 10 game sequence achieved by any of the teams?

bestRun <-data %>% 
  arrange(desc(points)) %>% 
  group_by(team) %>%
  slice(1)


 # In order to improve data display, the team needs to be changed from a character to a factor and ordered
 bestRun$team <-  factor(bestRun$team, levels = bestRun $team[order(bestRun$points)])

bestRun %>% 
  plot_ly(x=~points,y=~team, height=1500) %>% 
  add_bars(color = I("#f2dfa8"),  width=0.5)  %>% 
  layout(
    barmode = "overlay",
    title = "Best 10 game point tally in Premier League",
    xaxis = list(title = "Total Points"),
    yaxis = list(title = ""),
    margin = list(l = 90)
  ) %>%
  config(displayModeBar = F, showLink = F)

Tottenham, along with Everton, remain one of the two ever-present (25 year) Premier League teams yet to record a double-digit win sequence.


That is just a taster. The above methods are easily extendable :-

  • Add filters for venue; sequence-length; and opposition
  • Do all league games ever using the engsoccerdata package
  • Other sports
  • Other data e.g financial, climate

Please let me know if you have any code enhancements or create any apps based on this approach

Share