StatCan API's Discovered

My last post was a first look at StatCan data which highlighted that there was a lot of data available but that it was not necessarily easily available or perfectly presented

Since then (and apologies where due), I have come across a couple of APIs

First-off, StatCan do have a developers page one of which provides access to hundreds of indicators in JSON format. Here is an example of one they have tweeted

library(blogdown) 
shortcode("tweet", "894317284046630912")

Let’s load the libraries and see what is available for the all indicators option

library(httr)
library(jsonlite)
library(listviewer)
library(tidyverse)


library(stringr)
library(plotly)

The listviewer package, an htmlwidget from the ubiquitous Kent Russell and others, provides a great way to explore lists



url <- "http://www.statcan.gc.ca/sites/json/ind-all.json"
response <-  GET(url)


parsed <- fromJSON(content(response, "text"), simplifyVector = FALSE)
jsonedit(parsed)

If you drill down “results > indicators > 0 > title > en” you can see the title of one of the more than thousand indicators. I believe they get added consecutively to the top but at the time of writing the first one was

“Proportion (%) of one-person houselds” with a value of 28.2%.

This is from the 2016 census and the first occasion on which this category has exceeded ‘couple’s with children’ as the most common household configuration

OK, let’s see what we can output from this list. In purrr there is usually more than on way to skin a cat (groan). Any suggestions for improvement welcome



# start deeper into the nested list
ind_list <- parsed$results$indicators

# Now use purrr to create atomic vectors

registry_number <- map_chr(ind_list, "registry_number")
indicator_number <- map_chr(ind_list, "indicator_number")
geo_code <- map_chr(ind_list, "geo_code")
source <- map_chr(ind_list, "source")
themes <- map_chr(ind_list, "themes")
release_date <- map_chr(ind_list, "release_date")

## For those where we need to go down a further level we can use a vector
## either numbered
title <- map_chr(ind_list, c(4, 1))
#or text
refper <- map_chr(ind_list, c("refper", "en"))
value <- map_chr(ind_list, c("value", "en"))
daily_url <- map_chr(ind_list, c("daily_url", "en"))
daily_title <- map_chr(ind_list, c("daily_title", "en"))

## combine into a data.frame
l <-
  list(
    registry_number = registry_number,
    indicator_number = indicator_number,
    geo_code = geo_code,
    source = source,
    themes = themes,
    release_date = release_date,
    title = title ,
    refper = refper,
    value = value,
    daily_url = daily_url,
    daily_title = daily_title
  )

indices.df <- as_tibble(l)

#and display in a table with selected columns
indices.df %>%
  select(geo_code,source,themes,title,value) %>% 
  DT::datatable(width=600,
    class = 'compact stripe hover row-border order-column',
    rownames = FALSE,
    options = list(
      paging = TRUE,
      searching = TRUE,
      info = FALSE
    )
  )

NB I have shown only a selection of columns to cater for narrow blog width

You can search for an item of interest e.g try “Potato” and you can see that there is one entry which appears to show 344,884 acres of Potatoes were planted in Canada this year, more than enough to cover Phoenix’s city limits

Looking back at the listviewer we can see that two of the table columns geo_code and themes appear to have equivalent raw data. Let’s tabulize them as well. It’s easier the second time through. For any Francophiles, just swap in the French alternative


geo_list <- parsed$results$geo

geo_code <- map_chr(geo_list, "geo_code")
geo_name <- map_chr(geo_list, c(2,1))


l <-
  list(
    geo_code=geo_code,
    geo_name=geo_name
    )

geo.df <- as_tibble(l)

geo.df %>%
  DT::datatable(
    class = 'compact stripe hover row-border order-column',
    rownames = FALSE,
    options = list(
      paging = TRUE,
      searching = TRUE,
      info = FALSE
    )
  )

## similar for themes - probably a map_df alternative

theme_list <- parsed$results$themes_en



theme_code <- map_chr(theme_list, 1)
theme_name <- map_chr(theme_list, 2)


l <-
  list(
   theme_code=theme_code,
    theme_name=theme_name
    )

theme.df <- as_tibble(l)

theme.df %>%
  DT::datatable(
    class = 'compact stripe hover row-border order-column',
    rownames = FALSE,
    options = list(
      paging = TRUE,
      searching = TRUE,
      info = FALSE
    )
  )

OK we can now link the geo data.frames to make the tabe more meaningful


indices.df %>% 
  left_join(geo.df) %>% 
  select(title,refper,geo_name,themes,value,source)%>%
                         DT::datatable(class='compact stripe hover row-border order-column',rownames=FALSE,options= list(paging = TRUE, searching = TRUE,info=FALSE))