World Bank Data with data360r

2017-09-29 · 24454 · 115

economics · opendata · packages · politics

NB This is a revamp of a previous post

The World Bank is a great source of economic and social data. A couple of years ago, I used it to create a shiny app which still appears to work!

Earlier this year, the World Bank launched a new open data platform which collates more than 2000 indices from 40+ data sources many external to the World Bank

Even better, there is now an R package data360r to access the API and deliver data in a tidy format. Reg Onglao is its author and maintainer and she has written an introductory blog post on the World Bank Site. There are further examples on github. The package also provides access to Governance Global Practice indices and the World Bank Group’s Trade & Competitiveness Global Practice data

Rummage around on the World Bank site for access to thousands of reports and explanations for how the data is collected

Let’s take it for a ride. These are packages that I will use


library(tidyverse)
library(plotly)
library(leaflet)
library(stringr)
library(sf)
library(viridisLite)

library(data360r)

metadata

The get_metadata360() function downloads the requested TC/Govdata360 metadata, including:

‘datasets’ == Lists metadata for all datasets.
‘indicators’ == Lists metadata for all indicators. Does not return actual data.
‘countries’ == Lists metadata for all countries and regions.

This will give us a good handle on what is available

Datasets

In addition to World Bank datasets, many others have been incorporated into the API. Click on any link to get to a web page of the relevant organization


df_datasets <- get_metadata360(metadata_type = 'datasets')

df_datasets  %>%
#mutate(sourceLink = paste0("<a href=', link, "\" target=\"_blank\">", source, '</a>"))
mutate(sourceLink = paste0("<a href=", link, ">", source, "</a>")) %>%
arrange(source) %>%
select(dataset_id = id, sourceLink) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
escape = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)

Indicators

## get all and 



df_indicators <- get_metadata360(metadata_type = "indicators")

glimpse(df_indicators)
## Observations: 3,738
## Variables: 16
## $ id               <int> 940, 40001, 3386, 415, 1026, 1841, 915, 3385,...
## $ name             <chr> "GDP growth (annual %)", "Ease of Doing Busin...
## $ dataset          <chr> "World Development Indicators", "Doing Busine...
## $ valueType        <chr> "value", "dtf", "rank", "rank", "value", "val...
## $ datasetId        <int> 56, 52, 65, 52, 56, 56, 56, 65, 56, 53, 56, 5...
## $ defaultViz       <chr> "line_chart", "line_chart", "line_chart", "li...
## $ doNotUseViz      <list> [<"spider_chart", "slope_graph", "chord_diag...
## $ rank             <chr> "1.001", "1.001", "1.001", "1.001", "1.001", ...
## $ definition       <chr> "Annual percentage growth rate of GDP at mark...
## $ units            <chr> "Annual %", "Distance to Frontier", "Rank", "...
## $ subindicatorType <chr> "Value", "Distance to Frontier", "Rank", "Ran...
## $ timeframes       <list> [<"1961", "1962", "1963", "1964", "1965", "1...
## $ byProduct        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ byPartner        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ periodicity      <chr> "annually", "annually", "annually", "annually...
## $ dateRange        <chr> "1961-2016", "2016-2018", "2012-2016", "2015-...

df_indicators %>%
select(id, name, valueType, dateRange) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)

There are more than 2500 indicators. There are 16 variables in all including a lengthy definition. The table above is an easy way of searching For instance, entering “tourism” will show that some eg the 2469* datasets include a handy reprocessing of the same underlying data

Countries

The countries data includes some interesting basic data. One thing new to me is that the geo variable (providing lng and lat for each capital city) is a data.frame but in View() form they are split into geo.lat & geo.lng

I have used the leaflet package to map capital cities


df_countries <- get_metadata360(metadata_type = 'countries')

glimpse(df_countries)
## Observations: 217
## Variables: 13
## $ id          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ iso2        <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AG", "AR", "A...
## $ iso3        <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ name        <chr> "Afghanistan", "Albania", "Algeria", "American Sam...
## $ region      <chr> "SAS", "ECS", "MEA", "EAS", "ECS", "SSF", "LCN", "...
## $ adminRegion <chr> "SAS", "ECA", "MNA", "EAP", NA, "SSA", NA, NA, "EC...
## $ incomeLevel <chr> "LIC", "UMC", "UMC", "UMC", "HIC", "UMC", "HIC", "...
## $ lendingType <chr> "IDX", "IBD", "IBD", "LNX", "LNX", "IBD", "IBD", "...
## $ capitalCity <chr> "Kabul", "Tirane", "Algiers", "Pago Pago", "Andorr...
## $ geo         <data.frame> c("34.52280", "41.33170", "36.73970", "-14....
## $ aggregate   <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
## $ sids        <lgl> FALSE, FALSE, FALSE, NA, NA, FALSE, TRUE, FALSE, F...
## $ landlocked  <lgl> TRUE, FALSE, FALSE, NA, NA, FALSE, FALSE, FALSE, T...

df_countries %>%
leaflet() %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addCircles(
lng = ~ geo$lng,
lat = ~ geo$lat,
label = ~ paste0(capitalCity, ", ", name)
)

In addition to the get_metadata360() function, there are also `get_resources360() and search_360() functions which to help explore the datasets

Example

Let’s take a look at World Governance Indicators. Clicking on the link in the table will lead you to details of methodology etc. but for purposes of illustration of the package, the image below gives a broad overview

We can now utilize the war-horse, get_data360 function with the appropriate id, 51. We can save a bit of processing by calling for the data in ‘long’, tidy format



df <- get_data360(dataset_id = 51, output_type = "long")

tail(df,10)
##     Country ISO3 Country Name dataset_id                Indicator
##  1:          ZWE     Zimbabwe         51              Rule of Law
##  2:          ZWE     Zimbabwe         51              Rule of Law
##  3:          ZWE     Zimbabwe         51              Rule of Law
##  4:          ZWE     Zimbabwe         51              Rule of Law
##  5:          ZWE     Zimbabwe         51              Rule of Law
##  6:          ZWE     Zimbabwe         51 Voice and Accountability
##  7:          ZWE     Zimbabwe         51 Voice and Accountability
##  8:          ZWE     Zimbabwe         51 Voice and Accountability
##  9:          ZWE     Zimbabwe         51 Voice and Accountability
## 10:          ZWE     Zimbabwe         51 Voice and Accountability
##     Subindicator Type Period Observation
##  1:          Estimate   2015  -1.3490863
##  2:       Lower Bound   2015   2.8708134
##  3:   Percentile Rank   2015   6.2500000
##  4:    Standard Error   2015   0.1384398
##  5:       Upper Bound   2015  11.4832535
##  6:          Estimate   2015  -1.1926939
##  7:       Lower Bound   2015   9.8039217
##  8:   Percentile Rank   2015  15.2709360
##  9:    Standard Error   2015   0.1214361
## 10:       Upper Bound   2015  21.0784321

As can be seen, for each country/year combo there are sub-indicators which provides both the estimate (over a range -2.5 to +2.5) for each indicator, a degree of its potential error and equivalent data for its rank. So here for example, at best, Zimbabwe would fall in the lowest 11% for Rule of Law in 2015

Let’s see how this ‘Rule of Law’ estimate varies over time


# previous - tTHEY HAVE CHANGED iNDICATOR NAMES
# df %>% 
#   filter(`Country Name`=="Zimbabwe"&Indicator=="Rule of law score (-2.5 to 2.5)"&`Subindicator Type`=="Estimate") %>% 
#   mutate(Year=as.integer(as.character(Period))) %>% 
#   plot_ly(x=~Year,y=~Observation,
#           hoverinfo="text",
#           text="") %>% 
#   add_lines()


df %>% 
  filter(`Country Name`=="Zimbabwe"&Indicator=="Rule of Law"&`Subindicator Type`=="Estimate") %>% 
  mutate(Year=as.integer(as.character(Period))) %>% 
  plot_ly(x=~Year,y=~Observation,
          hoverinfo="text",
          text="") %>% 
  add_lines()

Zimbabwe has never been famed for it’s governance but after increasing poverty in the 1990’s opposition both to President Mugabe’s rule and proposals to further extend his power grew. This was met by governement intimidation so that by 2009 Zimbabwe was in the lowest 1 percentile for this indicator

Overall Indicators

The indicators are likely to be closely correlated but let’s just group them together for the latest year available 2015 This requires a fair amount of data carpentry


# needs sorting

# create a distinct data.frame as it will be used for both table and map
# df_2015 <- df %>%
# # Reduce to required data
# filter(Period == "2015" & `Subindicator Type` == "Estimate") %>%
# # Remove extraneous info
# mutate(Indicator = str_sub(Indicator, 1, -21)) %>%
# # amend name and limit data
# select(Country = `Country Name`, Indicator, Observation, iso3 = `Country ISO3`) %>%
# # transform to wide format
# spread(Indicator, Observation) %>%
# # Calculatea an overall score, reduce field lengths and rank countries accordingly
# mutate(meanIndicator = rowMeans(.[3:8], na.rm = TRUE)) %>%
# mutate_if(is.numeric, round, 2) %>%
# arrange(desc(meanIndicator)) %>%
# mutate(Rank = row_number()) %>%
# arrange(Rank)


# create a distinct data.frame as it will be used for both table and map
df_2015 <- df %>%
# Reduce to required data
filter(Period == "2015" & `Subindicator Type` == "Estimate") %>%
# Remove extraneous info (no longer required)
#mutate(Indicator = str_sub(Indicator, 1, -21)) %>%
# amend name and limit data
select(Country = `Country Name`, Indicator, Observation, iso3 = `Country ISO3`) %>%
# transform to wide format
spread(Indicator, Observation) %>%
# Calculatea an overall score, reduce field lengths and rank countries accordingly
mutate(meanIndicator = rowMeans(.[3:8], na.rm = TRUE)) %>%
mutate_if(is.numeric, round, 2) %>%
arrange(desc(meanIndicator)) %>%
mutate(Rank = row_number()) %>%
arrange(Rank)


df_2015 %>%
select(Rank, Country, `Control of Corruption`:`Rule of Law`, Voice = `Voice and Accountability`) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)

N.B. +2.5 is best score, -2.5 lowest

It would also be interesting to map this

We do not currently have any country shapefile information in the data but this can be provided from the sf package. We can then join it via df_countries iso3 filed to our indicator data.frame, df2015

They are colored with the viridis scale which renders well in balck and white and can be perceived by those who have most common forms of color blindness


# collect shapefiles and rename variable for easier join
world <-
st_read(system.file("shapes/world.gpkg", package = "spData")) %>%
mutate(iso2 = as.character(iso_a2))
## Reading layer `wrld.gpkg' from data source `C:\Users\Andrew\Documents\R\win-library\3.4\spData\shapes\world.gpkg' using driver `GPKG'
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
# class(world) [1] "sf"         "data.frame"



df_2015 <- df_2015 %>%
left_join(df_countries, by = c("iso3" = "iso3"))


map_df <- world %>%
left_join(df_2015)

# Use a color scale that
pal <- colorNumeric("viridis", domain = map_df$Rank, reverse = TRUE)

map_df %>%
leaflet(options = leafletOptions(minZoom = 2)) %>%
addProviderTiles(provider = "CartoDB.Positron") %>%

addPolygons(
label = ~ paste(Rank, name_long),
opacity = 0.3,
color = ~ pal(Rank)
) %>%
addLegend(
"bottomleft",
pal = pal,
values = ~ Rank,
title = "Overall Rank of Six <br> World Governance Indicators",
opacity = 0.5
)

Pan, zoom and hover for Country Ranking

It would make sense to create a shiny app for this data. It’s on the list

Feel free to contact me if you have any questions or work offers