There is aer a lot of opportunities for data visualizations in journalism and R is beginning to get a toehold in this arena
I recently came across a post from the R for journalists page and decided to reprocess it using tidyverse packages and add a plot and map
The article uses some GB office of National Statistics data to calculate the estimated mean age of the population of each of the England and Wales parliamentary constituencies
Let’s load all the libraries required and the downloaded data. With a little bit of inspection, the correct parameters can be applied to produce a data.frame
library(tidyverse)
library(plotly)
library(leaflet)
library(sf)
library(htmlwidgets)
library(DT)
library(hansard)
library(mnis)
library(parlitools)
library(readxl)
ages <- read_excel("data/constituencyAges.xls", sheet=2, skip=2)
ages
## # A tibble: 573 x 94
## PCON11CD PCON11NM `All Ages` `0` `1` `2` `3`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 E14000530 Aldershot 105653 1509 1445 1460 1457
## 2 E14000531 Aldridge-Brownhills 77670 814 780 806 788
## 3 E14000532 Altrincham and Sale West 99970 1079 1195 1166 1346
## 4 E14000533 Amber Valley 88985 916 883 984 960
## 5 E14000534 Arundel and South Downs 98989 848 828 896 1016
## 6 E14000535 Ashfield 104653 1164 1248 1261 1326
## 7 E14000536 Ashford 119036 1496 1429 1578 1611
## 8 E14000537 Ashton-under-Lyne 91374 1219 1228 1259 1288
## 9 E14000538 Aylesbury 117278 1682 1603 1725 1766
## 10 E14000539 Banbury 120809 1523 1450 1539 1566
## # ... with 563 more rows, and 87 more variables: `4` <dbl>, `5` <dbl>,
## # `6` <dbl>, `7` <dbl>, `8` <dbl>, `9` <dbl>, `10` <dbl>, `11` <dbl>,
## # `12` <dbl>, `13` <dbl>, `14` <dbl>, `15` <dbl>, `16` <dbl>,
## # `17` <dbl>, `18` <dbl>, `19` <dbl>, `20` <dbl>, `21` <dbl>,
## # `22` <dbl>, `23` <dbl>, `24` <dbl>, `25` <dbl>, `26` <dbl>,
## # `27` <dbl>, `28` <dbl>, `29` <dbl>, `30` <dbl>, `31` <dbl>,
## # `32` <dbl>, `33` <dbl>, `34` <dbl>, `35` <dbl>, `36` <dbl>,
## # `37` <dbl>, `38` <dbl>, `39` <dbl>, `40` <dbl>, `41` <dbl>,
## # `42` <dbl>, `43` <dbl>, `44` <dbl>, `45` <dbl>, `46` <dbl>,
## # `47` <dbl>, `48` <dbl>, `49` <dbl>, `50` <dbl>, `51` <dbl>,
## # `52` <dbl>, `53` <dbl>, `54` <dbl>, `55` <dbl>, `56` <dbl>,
## # `57` <dbl>, `58` <dbl>, `59` <dbl>, `60` <dbl>, `61` <dbl>,
## # `62` <dbl>, `63` <dbl>, `64` <dbl>, `65` <dbl>, `66` <dbl>,
## # `67` <dbl>, `68` <dbl>, `69` <dbl>, `70` <dbl>, `71` <dbl>,
## # `72` <dbl>, `73` <dbl>, `74` <dbl>, `75` <dbl>, `76` <dbl>,
## # `77` <dbl>, `78` <dbl>, `79` <dbl>, `80` <dbl>, `81` <dbl>,
## # `82` <dbl>, `83` <dbl>, `84` <dbl>, `85` <dbl>, `86` <dbl>,
## # `87` <dbl>, `88` <dbl>, `89` <dbl>, `90` <dbl>
So we have an untidy, wide data.frame with most of the columns representing an age range. For instance, Aldershot had an estimated 1509 children between under 1 year in mid-2015
We can use th gather() function from the tidyr package (part of the tidyverse) to get a row for each age/constituency combination.
We can dispense with the all ages for this purpose( Note the requirement for a backtick as the column includes a space) but want to retain the other two columns
ages_gather <- ages %>%
select(-`All Ages`) %>%
gather(age,count,-c(PCON11CD,PCON11NM))
glimpse(ages_gather)
## Observations: 52,143
## Variables: 4
## $ PCON11CD <chr> "E14000530", "E14000531", "E14000532", "E14000533", "...
## $ PCON11NM <chr> "Aldershot", "Aldridge-Brownhills", "Altrincham and S...
## $ age <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"...
## $ count <dbl> 1509, 814, 1079, 916, 848, 1164, 1496, 1219, 1682, 15...
We want to get a weighted mean age for every constuency. That will require amending the age column from a charactr to an integer and adding 0.5 - on the reasonable assumption that the average age is half of the range - and then using dplyr to calculate the values
meanAges <- ages_gather %>%
mutate(age=as.integer(age)+0.5) %>%
group_by(PCON11CD,PCON11NM) %>%
summarize(wtdMean=round(weighted.mean(age,count),1)) %>%
ungroup()
meanAges %>%
select(Constituency=PCON11NM,`Av. Age`=wtdMean) %>%
DT::datatable(width=400,class='compact stripe hover row-border order-column',rownames=FALSE,options= list(paging = TRUE, searching = TRUE,info=FALSE))
We now have a searchable, sortable table
The mean age of Aldershot is 38.6. This is 0.5 higher than the original article but I’m assuming he did not add the aforementioned 0.5. He has the same lowest age - Birmingham Ladywood. Christchurch , which happens to be a favourite retirement area, where my mother happened to die, is the oldest
A histogram will provide a swift look at the distribution
meanAges %>%
plot_ly(x=~wtdMean) %>%
layout(title="Histogram of mean ages by UK constituency, 2015",
xaxis=list(title="Average Age"),
yaxis=list(title="Number of constituencies")) %>% config(displayModeBar = F,showLink = F)
## No trace type specified:
## Based on info supplied, a 'histogram' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#histogram
A, more-or-less, normal distribution but with an age-range greater than I would have guessed going into this exercise
Ok lets look at mapping the data. One interesting method on constructing a hexagon-based electoral cartogram is in a [post by ??] (https://robwhickman.github.io/2017/05/16/uk-general-election-hexagram-part-1/) but I have gone with the parlitools package from Evan Odell, whose hansard package I have previously utilized in a couple of earlier posts
The code is basically a copy from his vignette but plugging in the age data
west_hex_map <- parlitools::west_hex_map #Base map
# color range
pal = colorNumeric("Oranges", meanAges$wtdMean)
west_mean_ages <- left_join(west_hex_map, meanAges, by = c("gss_code"="PCON11CD")) %>% #Joining to base map
filter(!is.na(wtdMean)) # rstrict to England and Wales
#create infomatic label
label_yes <- paste0(
"<strong>", west_mean_ages$constituency_name, "</strong>", "</br>",
"Av. Age: ", west_mean_ages$wtdMean
) %>% lapply(htmltools::HTML)
# create map
leaflet(options=leafletOptions(
dragging = FALSE, zoomControl = FALSE, tap = FALSE,
minZoom = 6, maxZoom = 6, maxBounds = list(list(2.5,-7.75),list(58.25,50.0)),
attributionControl = FALSE),
west_mean_ages) %>%
addPolygons(
color = "grey",
weight=0.75,
opacity = 0.5,
fillOpacity = 1,
fillColor = ~pal(wtdMean),
label = label_yes) %>%
addLegend("topright", pal = pal, values = ~wtdMean,
title = "Mean Age",
opacity = 1) %>%
htmlwidgets::onRender(
"function(x, y) {
var myMap = this;
myMap._container.style['background'] = '#fff';
}")%>%
mapOptions(zoomToLimits = "first")
Hover map for details. This approach neatly highlights that there is a preponderance of younger people living in the middle of the major cities, e.g. London, Birmingham , Leeds. In contrast, coastal regions attractive to retirees have much higher average ages
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Email