Bob Rudis always keeps his pulse on the best packages and recently bemoaned the fact that Kevin Ushey’s RcppRoll package which
Provides Routines for the efficient computation of windowed mean, median,
sum, product, minimum, maximum, standard deviation and variance are provided.
has not had enough praise
One reason is probably the lack of examples available - even in the vignette
I have therefore done a swift illustration using just one of the functions, roll_sum()
Best Runs in the Premier league
Tottenham Hotspur recently went on a tear of nine consecutive victories, ended recently by West Ham and thereby pretty well scuppering their remote chances of the league title
Let’s do a quick check to see if the past 10 games remains their most productive in terms of points gained (which, for the uninitated is 3 for a win, 1 for a draw/tie and zero for a loss) since 1992, when the Premier League was formed
N.B Initial analysis based on data to 4th May 2017 but underlying data is updated periodically
Here are libraries used
library(RcppRoll)
library(plotly)
library(crosstalk)
library(purrr)
library(htmltools)
library(tidyverse)
I maintain a ‘standings’ file which contains the raw data required
standings <- readRDS("data/standings.rds")
glimpse(standings)
## Observations: 19,492
## Variables: 20
## $ season <chr> "2004/05", "2004/05", "2004/05", "2004/05", "200...
## $ final_Pos <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ team <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Che...
## $ GF <int> 1, 3, 1, 2, 3, 0, 1, 3, 4, 1, 3, 1, 0, 1, 3, 2, ...
## $ GA <int> 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, ...
## $ gameDate <date> 2005-05-15, 2005-05-10, 2005-05-07, 2005-04-30,...
## $ tmGameOrder <int> 506, 505, 504, 503, 502, 501, 500, 499, 498, 497...
## $ tmYrGameOrder <int> 38, 37, 36, 35, 34, 33, 32, 31, 30, 29, 28, 27, ...
## $ venue <chr> "A", "A", "H", "A", "H", "H", "H", "A", "H", "H"...
## $ MATCHID <int> 5285, 5277, 5265, 5253, 5243, 5235, 5217, 5212, ...
## $ OppTeam <chr> "Newcastle U", "Man. Utd.", "Charlton", "Bolton"...
## $ points <dbl> 1, 3, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 1, 3, 3, 3, ...
## $ cumGF <int> 72, 71, 68, 67, 65, 62, 62, 61, 58, 54, 53, 50, ...
## $ cumGA <int> 15, 14, 13, 13, 13, 12, 12, 11, 10, 9, 9, 8, 8, ...
## $ cumPts <dbl> 95, 94, 91, 88, 85, 82, 81, 80, 77, 74, 71, 68, ...
## $ cumGD <int> 57, 57, 55, 54, 52, 50, 50, 50, 48, 45, 44, 42, ...
## $ allGames <dbl> 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, 38, ...
## $ position <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
## $ res <chr> "Draw", "Win", "Win", "Win", "Win", "Draw", "Dra...
## $ tt <chr> "<table cellpadding='4' style='line-height:1'><t...
team, points and tmGameOrder are the fields required
We can now select Tottenham’s games, use the roll_sum() function to get a rolling total of points secured in ten consecutive games and plot the results in an infoactive chart
# select team and ensure data is in correct order
tm <- standings %>%
filter(team=="Tottenham H") %>%
arrange(tmGameOrder)
# construct data.frame of results
run <- roll_sum(tm$points,n=10)
df <- data.frame(points=run,seq=1:length(run))
# produce chart
df %>%
plot_ly(x=~seq,y=~points) %>%
add_lines()
As usual with a plotly chart, you can zoom and hover. I have not bothered with producing the best aesthetic!
Although this is the first time they have had a sequence of nine consecutive wins, the fall of 2011 included a run of 11 games with 10 wins and a draw
Let’s extend this to all teams - the code only takes a couple of seconds to create a 20,000 row data.frame
# get vector of all 47 teams that have appeared in Premier League history
teams <- unique(standings$team)
get_runs <- function(x) {
tm <-standings %>%
filter(team==x) %>%
arrange(tmGameOrder)
# construct data.frame of results
run <- roll_sum(tm$points,n=10)
data.frame(points=run,seq=1:length(run),team=x)
}
# apply the above function to all teams
data <-map_df(teams, get_runs)
glimpse(data)
## Observations: 19,069
## Variables: 3
## $ points <dbl> 13, 12, 15, 17, 16, 16, 16, 19, 22, 20, 20, 21, 19, 16,...
## $ seq <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ...
## $ team <chr> "Chelsea", "Chelsea", "Chelsea", "Chelsea", "Chelsea", ...
We can now use this data in a couple of ways. Lets use the filter functions from the crosstalk package to select any team. I covered this in a bit more depth here
sd <- SharedData$new(data)
fs <- filter_select(
id = "team1",
label = "Select Team",
sharedData = sd,
group = ~ team,
allLevels = FALSE,
multiple = FALSE
)
## this is needed as crosstalk does not work nicely with bootstrap, apparently
fs_nobootstrap <- fs
attr(fs_nobootstrap, "html_dependencies") <- Filter(
function(dep) {dep$name != "bootstrap"},
attr(fs_nobootstrap, "html_dependencies")
)
# produce chart
myChart <- sd %>%
plot_ly(x=~seq,y=~points) %>%
add_lines()
tagList(
fs_nobootstrap,
br(),
myChart
)
Currently, the filter_select does not appear to default to a single, selected value - though I believe that is being addressed. Anyways, it is a simple enough matter to select a team
How about the best 10 game sequence achieved by any of the teams?
bestRun <-data %>%
arrange(desc(points)) %>%
group_by(team) %>%
slice(1)
# In order to improve data display, the team needs to be changed from a character to a factor and ordered
bestRun$team <- factor(bestRun$team, levels = bestRun $team[order(bestRun$points)])
bestRun %>%
plot_ly(x=~points,y=~team, height=1500) %>%
add_bars(color = I("#f2dfa8"), width=0.5) %>%
layout(
barmode = "overlay",
title = "Best 10 game point tally in Premier League",
xaxis = list(title = "Total Points"),
yaxis = list(title = ""),
margin = list(l = 90)
) %>%
config(displayModeBar = F, showLink = F)
Tottenham, along with Everton, remain one of the two ever-present (25 year) Premier League teams yet to record a double-digit win sequence.
That is just a taster. The above methods are easily extendable :-
- Add filters for venue; sequence-length; and opposition
- Do all league games ever using the engsoccerdata package
- Other sports
- Other data e.g financial, climate
Please let me know if you have any code enhancements or create any apps based on this approach
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Email