My last post was a first look at StatCan data which highlighted that there was a lot of data available but that it was not necessarily easily available or perfectly presented
Since then (and apologies where due), I have come across a couple of APIs
First-off, StatCan do have a developers page one of which provides access to hundreds of indicators in JSON format. Here is an example of one they have tweeted
library(blogdown)
shortcode("tweet", "894317284046630912")
In May 2017, Canada imported $7,362,018 worth of oral or dental hygiene preparations, nes. #FreshBreathDay pic.twitter.com/qoArG3QX57
— Statistics Canada (@StatCan_eng) August 6, 2017
Let’s load the libraries and see what is available for the all indicators option
library(httr)
library(jsonlite)
library(listviewer)
library(tidyverse)
library(stringr)
library(plotly)
The listviewer package, an htmlwidget from the ubiquitous Kent Russell and others, provides a great way to explore lists
url <- "http://www.statcan.gc.ca/sites/json/ind-all.json"
response <- GET(url)
parsed <- fromJSON(content(response, "text"), simplifyVector = FALSE)
jsonedit(parsed)
If you drill down “results > indicators > 0 > title > en” you can see the title of one of the more than thousand indicators. I believe they get added consecutively to the top but at the time of writing the first one was
“Proportion (%) of one-person houselds” with a value of 28.2%.
This is from the 2016 census and the first occasion on which this category has exceeded ‘couple’s with children’ as the most common household configuration
OK, let’s see what we can output from this list. In purrr there is usually more than on way to skin a cat (groan). Any suggestions for improvement welcome
# start deeper into the nested list
ind_list <- parsed$results$indicators
# Now use purrr to create atomic vectors
registry_number <- map_chr(ind_list, "registry_number")
indicator_number <- map_chr(ind_list, "indicator_number")
geo_code <- map_chr(ind_list, "geo_code")
source <- map_chr(ind_list, "source")
themes <- map_chr(ind_list, "themes")
release_date <- map_chr(ind_list, "release_date")
## For those where we need to go down a further level we can use a vector
## either numbered
title <- map_chr(ind_list, c(4, 1))
#or text
refper <- map_chr(ind_list, c("refper", "en"))
value <- map_chr(ind_list, c("value", "en"))
daily_url <- map_chr(ind_list, c("daily_url", "en"))
daily_title <- map_chr(ind_list, c("daily_title", "en"))
## combine into a data.frame
l <-
list(
registry_number = registry_number,
indicator_number = indicator_number,
geo_code = geo_code,
source = source,
themes = themes,
release_date = release_date,
title = title ,
refper = refper,
value = value,
daily_url = daily_url,
daily_title = daily_title
)
indices.df <- as_tibble(l)
#and display in a table with selected columns
indices.df %>%
select(geo_code,source,themes,title,value) %>%
DT::datatable(width=600,
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)