Andrew Clark

There is aer a lot of opportunities for data visualizations in journalism and R is beginning to get a toehold in this arena

I recently came across a post from the R for journalists page and decided to reprocess it using tidyverse packages and add a plot and map

The article uses some GB office of National Statistics data to calculate the estimated mean age of the population of each of the England and Wales parliamentary constituencies

Let’s load all the libraries required and the downloaded data. With a little bit of inspection, the correct parameters can be applied to produce a data.frame

library(tidyverse)
library(plotly)

library(leaflet)
library(sf)
library(htmlwidgets)
library(DT)

library(hansard)
library(mnis)
library(parlitools)

library(readxl)

ages <- read_excel("data/constituencyAges.xls", sheet=2, skip=2)


ages
## # A tibble: 573 x 94
##     PCON11CD                 PCON11NM `All Ages`   `0`   `1`   `2`   `3`
##        <chr>                    <chr>      <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 E14000530                Aldershot     105653  1509  1445  1460  1457
##  2 E14000531      Aldridge-Brownhills      77670   814   780   806   788
##  3 E14000532 Altrincham and Sale West      99970  1079  1195  1166  1346
##  4 E14000533             Amber Valley      88985   916   883   984   960
##  5 E14000534  Arundel and South Downs      98989   848   828   896  1016
##  6 E14000535                 Ashfield     104653  1164  1248  1261  1326
##  7 E14000536                  Ashford     119036  1496  1429  1578  1611
##  8 E14000537        Ashton-under-Lyne      91374  1219  1228  1259  1288
##  9 E14000538                Aylesbury     117278  1682  1603  1725  1766
## 10 E14000539                  Banbury     120809  1523  1450  1539  1566
## # ... with 563 more rows, and 87 more variables: `4` <dbl>, `5` <dbl>,
## #   `6` <dbl>, `7` <dbl>, `8` <dbl>, `9` <dbl>, `10` <dbl>, `11` <dbl>,
## #   `12` <dbl>, `13` <dbl>, `14` <dbl>, `15` <dbl>, `16` <dbl>,
## #   `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
## #   `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>, `26` <dbl>,
## #   `27` <dbl>, `28` <dbl>, `29` <dbl>, `30` <dbl>, `31` <dbl>,
## #   `32` <dbl>, `33` <dbl>, `34` <dbl>, `35` <dbl>, `36` <dbl>,
## #   `37` <dbl>, `38` <dbl>, `39` <dbl>, `40` <dbl>, `41` <dbl>,
## #   `42` <dbl>, `43` <dbl>, `44` <dbl>, `45` <dbl>, `46` <dbl>,
## #   `47` <dbl>, `48` <dbl>, `49` <dbl>, `50` <dbl>, `51` <dbl>,
## #   `52` <dbl>, `53` <dbl>, `54` <dbl>, `55` <dbl>, `56` <dbl>,
## #   `57` <dbl>, `58` <dbl>, `59` <dbl>, `60` <dbl>, `61` <dbl>,
## #   `62` <dbl>, `63` <dbl>, `64` <dbl>, `65` <dbl>, `66` <dbl>,
## #   `67` <dbl>, `68` <dbl>, `69` <dbl>, `70` <dbl>, `71` <dbl>,
## #   `72` <dbl>, `73` <dbl>, `74` <dbl>, `75` <dbl>, `76` <dbl>,
## #   `77` <dbl>, `78` <dbl>, `79` <dbl>, `80` <dbl>, `81` <dbl>,
## #   `82` <dbl>, `83` <dbl>, `84` <dbl>, `85` <dbl>, `86` <dbl>,
## #   `87` <dbl>, `88` <dbl>, `89` <dbl>, `90` <dbl>

So we have an untidy, wide data.frame with most of the columns representing an age range. For instance, Aldershot had an estimated 1509 children between under 1 year in mid-2015

We can use th gather() function from the tidyr package (part of the tidyverse) to get a row for each age/constituency combination.

We can dispense with the all ages for this purpose( Note the requirement for a backtick as the column includes a space) but want to retain the other two columns

ages_gather <- ages %>% 
  select(-`All Ages`) %>% 
  gather(age,count,-c(PCON11CD,PCON11NM))

glimpse(ages_gather)
## Observations: 52,143
## Variables: 4
## $ PCON11CD <chr> "E14000530", "E14000531", "E14000532", "E14000533", "...
## $ PCON11NM <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and S...
## $ age      <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"...
## $ count    <dbl> 1509, 814, 1079, 916, 848, 1164, 1496, 1219, 1682, 15...

We want to get a weighted mean age for every constuency. That will require amending the age column from a charactr to an integer and adding 0.5 - on the reasonable assumption that the average age is half of the range - and then using dplyr to calculate the values

meanAges <- ages_gather %>% 
  mutate(age=as.integer(age)+0.5) %>% 
  group_by(PCON11CD,PCON11NM) %>% 
  summarize(wtdMean=round(weighted.mean(age,count),1)) %>% 
  ungroup()

meanAges %>%
  select(Constituency=PCON11NM,`Av. Age`=wtdMean) %>% 
                         DT::datatable(width=400,class='compact stripe hover row-border order-column',rownames=FALSE,options= list(paging = TRUE, searching = TRUE,info=FALSE))

We now have a searchable, sortable table

The mean age of Aldershot is 38.6. This is 0.5 higher than the original article but I’m assuming he did not add the aforementioned 0.5. He has the same lowest age - Birmingham Ladywood. Christchurch , which happens to be a favourite retirement area, where my mother happened to die, is the oldest


A histogram will provide a swift look at the distribution

meanAges %>% 
  plot_ly(x=~wtdMean) %>% 
  layout(title="Histogram of mean ages by UK constituency, 2015",
  xaxis=list(title="Average Age"),
  yaxis=list(title="Number of constituencies")) %>%  config(displayModeBar = F,showLink = F)
## No trace type specified:
##   Based on info supplied, a 'histogram' trace seems appropriate.
##   Read more about this trace type -> https://plot.ly/r/reference/#histogram

A, more-or-less, normal distribution but with an age-range greater than I would have guessed going into this exercise


Ok lets look at mapping the data. One interesting method on constructing a hexagon-based electoral cartogram is in a [post by ??] (https://robwhickman.github.io/2017/05/16/uk-general-election-hexagram-part-1/) but I have gone with the parlitools package from Evan Odell, whose hansard package I have previously utilized in a couple of earlier posts

The code is basically a copy from his vignette but plugging in the age data

west_hex_map <- parlitools::west_hex_map #Base map

# color range
 pal = colorNumeric("Oranges", meanAges$wtdMean)

 west_mean_ages <- left_join(west_hex_map, meanAges, by = c("gss_code"="PCON11CD")) %>%  #Joining to base map
                      filter(!is.na(wtdMean)) # rstrict to England and Wales
 
 #create infomatic label
label_yes <- paste0(
  "<strong>", west_mean_ages$constituency_name, "</strong>", "</br>",
  "Av. Age: ", west_mean_ages$wtdMean
) %>% lapply(htmltools::HTML)

# create map
leaflet(options=leafletOptions(
  dragging = FALSE, zoomControl = FALSE, tap = FALSE,
  minZoom = 6, maxZoom = 6, maxBounds = list(list(2.5,-7.75),list(58.25,50.0)),
  attributionControl = FALSE),
  west_mean_ages) %>%
  addPolygons(
    color = "grey",
    weight=0.75,
    opacity = 0.5,
    fillOpacity = 1,
    fillColor = ~pal(wtdMean),
    label = label_yes) %>%
  
  addLegend("topright", pal = pal, values = ~wtdMean,
    title = "Mean Age",
    opacity = 1)  %>% 
  htmlwidgets::onRender(
    "function(x, y) {
        var myMap = this;
        myMap._container.style['background'] = '#fff';
    }")%>% 
  mapOptions(zoomToLimits = "first")

Hover map for details. This approach neatly highlights that there is a preponderance of younger people living in the middle of the major cities, e.g. London, Birmingham , Leeds. In contrast, coastal regions attractive to retirees have much higher average ages

comments powered by Disqus