World Bank Data with data360r

NB This is a revamp of a previous post

The World Bank is a great source of economic and social data. A couple of years ago, I used it to create a shiny app which still appears to work!

Earlier this year, the World Bank launched a new open data platform which collates more than 2000 indices from 40+ data sources many external to the World Bank

Even better, there is now an R package data360r to access the API and deliver data in a tidy format. Reg Onglao is its author and maintainer and she has written an introductory blog post on the World Bank Site. There are further examples on github. The package also provides access to Governance Global Practice indices and the World Bank Group’s Trade & Competitiveness Global Practice data

Rummage around on the World Bank site for access to thousands of reports and explanations for how the data is collected

Let’s take it for a ride. These are packages that I will use


library(tidyverse)
library(plotly)
library(leaflet)
library(stringr)
library(sf)
library(viridisLite)

library(data360r)

metadata

The get_metadata360() function downloads the requested TC/Govdata360 metadata, including:

  • ‘datasets’ == Lists metadata for all datasets.
  • ‘indicators’ == Lists metadata for all indicators. Does not return actual data.
  • ‘countries’ == Lists metadata for all countries and regions.

This will give us a good handle on what is available

Datasets

In addition to World Bank datasets, many others have been incorporated into the API. Click on any link to get to a web page of the relevant organization


df_datasets <- get_metadata360(metadata_type = 'datasets')

df_datasets  %>%
#mutate(sourceLink = paste0("<a href=', link, "\" target=\"_blank\">", source, '</a>"))
mutate(sourceLink = paste0("<a href=", link, ">", source, "</a>")) %>%
arrange(source) %>%
select(dataset_id = id, sourceLink) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
escape = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)

Indicators

## get all and 



df_indicators <- get_metadata360(metadata_type = "indicators")

glimpse(df_indicators)
## Observations: 2,515
## Variables: 16
## $ id               <int> 1541, 1841, 530, 3385, 416, 2013, 915, 1026, ...
## $ name             <chr> "Foreign direct investment, net inflows (% of...
## $ dataset          <chr> "World Development Indicators", "World Develo...
## $ valueType        <chr> "value", "value", "rank", "value", "dtf", "va...
## $ datasetId        <int> 56, 56, 53, 65, 52, 56, 56, 56, 53, 52, 56, 6...
## $ defaultViz       <chr> "line_chart", "line_chart", "line_chart", "li...
## $ doNotUseViz      <list> [<"spider_chart", "slope_graph", "chord_diag...
## $ rank             <chr> "1.001", "1.001", "1.001", "1.001", "1.001", ...
## $ definition       <chr> "Foreign direct investment are the net inflow...
## $ units            <chr> "% of GDP", "# of arrivals", "Rank", "Index (...
## $ subindicatorType <chr> "Value", "Value", "Rank", "Index (1-7)", "Dis...
## $ timeframes       <list> [<"1970", "1971", "1972", "1973", "1974", "1...
## $ byProduct        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ byPartner        <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ periodicity      <chr> "annually", "annually", "annually", "annually...
## $ dateRange        <chr> "1970-2016", "1995-2016", "2007-2016", "2012-...

df_indicators %>%
select(id, name, valueType, dateRange) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)                                 

There are more than 2500 indicators. There are 16 variables in all including a lengthy definition. The table above is an easy way of searching For instance, entering “tourism” will show that some eg the 2469* datasets include a handy reprocessing of the same underlying data

Countries

The countries data includes some interesting basic data. One thing new to me is that the geo variable (providing lng and lat for each capital city) is a data.frame but in View() form they are split into geo.lat & geo.lng

I have used the leaflet package to map capital cities


df_countries <- get_metadata360(metadata_type = 'countries')

glimpse(df_countries)
## Observations: 217
## Variables: 13
## $ id          <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ iso2        <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AG", "AR", "A...
## $ iso3        <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ name        <chr> "Afghanistan", "Albania", "Algeria", "American Sam...
## $ region      <chr> "SAS", "ECS", "MEA", "EAS", "ECS", "SSF", "LCN", "...
## $ adminRegion <chr> "SAS", "ECA", "MNA", "EAP", NA, "SSA", NA, NA, "EC...
## $ incomeLevel <chr> "LIC", "UMC", "UMC", "UMC", "HIC", "UMC", "HIC", "...
## $ lendingType <chr> "IDX", "IBD", "IBD", "LNX", "LNX", "IBD", "IBD", "...
## $ capitalCity <chr> "Kabul", "Tirane", "Algiers", "Pago Pago", "Andorr...
## $ geo         <data.frame> c("34.52280", "41.33170", "36.73970", "-14....
## $ aggregate   <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
## $ sids        <lgl> FALSE, FALSE, FALSE, NA, NA, FALSE, TRUE, FALSE, F...
## $ landlocked  <lgl> TRUE, FALSE, FALSE, NA, NA, FALSE, FALSE, FALSE, T...

df_countries %>%
leaflet() %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addCircles(
lng = ~ geo$lng,
lat = ~ geo$lat,
label = ~ paste0(capitalCity, ", ", name)
)

In addition to the get_metadata360() function, there are also `get_resources360() and search_360() functions which to help explore the datasets


Example

Let’s take a look at World Governance Indicators. Clicking on the link in the table will lead you to details of methodology etc. but for purposes of illustration of the package, the image below gives a broad overview

We can now utilize the war-horse, get_data360 function with the appropriate id, 51. We can save a bit of processing by calling for the data in ‘long’, tidy format



df <- get_data360(dataset_id = 51, output_type = "long")

tail(df,10)
##     Country ISO3 Country Name dataset_id
##  1:          ZWE     Zimbabwe         51
##  2:          ZWE     Zimbabwe         51
##  3:          ZWE     Zimbabwe         51
##  4:          ZWE     Zimbabwe         51
##  5:          ZWE     Zimbabwe         51
##  6:          ZWE     Zimbabwe         51
##  7:          ZWE     Zimbabwe         51
##  8:          ZWE     Zimbabwe         51
##  9:          ZWE     Zimbabwe         51
## 10:          ZWE     Zimbabwe         51
##                                      Indicator Subindicator Type Period
##  1:            Rule of law score (-2.5 to 2.5)          Estimate   2015
##  2:            Rule of law score (-2.5 to 2.5)       Lower Bound   2015
##  3:            Rule of law score (-2.5 to 2.5)   Percentile Rank   2015
##  4:            Rule of law score (-2.5 to 2.5)    Standard Error   2015
##  5:            Rule of law score (-2.5 to 2.5)       Upper Bound   2015
##  6: Voice & accountability score (-2.5 to 2.5)          Estimate   2015
##  7: Voice & accountability score (-2.5 to 2.5)       Lower Bound   2015
##  8: Voice & accountability score (-2.5 to 2.5)   Percentile Rank   2015
##  9: Voice & accountability score (-2.5 to 2.5)    Standard Error   2015
## 10: Voice & accountability score (-2.5 to 2.5)       Upper Bound   2015
##     Observation
##  1:  -1.3490863
##  2:   2.8708134
##  3:   6.2500000
##  4:   0.1384398
##  5:  11.4832535
##  6:  -1.1926939
##  7:   9.8039217
##  8:  15.2709360
##  9:   0.1214361
## 10:  21.0784321

As can be seen, for each country/year combo there are sub-indicators which provides both the estimate (over a range -2.5 to +2.5) for each indicator, a degree of its potential error and equivalent data for its rank. So here for example, at best, Zimbabwe would fall in the lowest 11% for Rule of Law in 2015

Let’s see how this ‘Rule of Law’ estimate varies over time


df %>% 
  filter(`Country Name`=="Zimbabwe"&Indicator=="Rule of law score (-2.5 to 2.5)"&`Subindicator Type`=="Estimate") %>% 
  mutate(Year=as.integer(as.character(Period))) %>% 
  plot_ly(x=~Year,y=~Observation,
          hoverinfo="text",
          text="") %>% 
  add_lines()

Zimbabwe has never been famed for it’s governance but after increasing poverty in the 1990’s opposition both to President Mugabe’s rule and proposals to further extend his power grew. This was met by governement intimidation so that by 2009 Zimbabwe was in the lowest 1 percentile for this indicator

Overall Indicators

The indicators are likely to be closely correlated but let’s just group them together for the latest year available 2015 This requires a fair amount of data carpentry



# create a distinct data.frame as it will be used for both table and map
df_2015 <- df %>%
# Reduce to required data
filter(Period == "2015" & `Subindicator Type` == "Estimate") %>%
# Remove extraneous info
mutate(Indicator = str_sub(Indicator, 1, -21)) %>%
# amend name and limit data
select(Country = `Country Name`, Indicator, Observation, iso3 = `Country ISO3`) %>%
# transform to wide format
spread(Indicator, Observation) %>%
# Calculatea an overall score, reduce field lengths and rank countries accordingly
mutate(meanIndicator = rowMeans(.[3:8], na.rm = TRUE)) %>%
mutate_if(is.numeric, round, 2) %>%
arrange(desc(meanIndicator)) %>%
mutate(Rank = row_number()) %>%
arrange(Rank)


df_2015 %>%
select(Rank, Country, Corruption:`Rule of law`, Voice = `Voice & accountability`) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)

N.B. +2.5 is best score, -2.5 lowest

It would also be interesting to map this

We do not currently have any country shapefile information in the data but this can be provided from the sf package. We can then join it via df_countries iso3 filed to our indicator data.frame, df2015

They are colored with the viridis scale which renders well in balck and white and can be perceived by those who have most common forms of color blindness


# collect shapefiles and rename variable for easier join
world <-
st_read(system.file("shapes/world.gpkg", package = "spData")) %>%
mutate(iso2 = as.character(iso_a2))
## Reading layer `wrld.gpkg' from data source `C:\Users\Andrew\Documents\R\win-library\3.4\spData\shapes\world.gpkg' using driver `GPKG'
## Simple feature collection with 177 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
# class(world) [1] "sf"         "data.frame"



df_2015 <- df_2015 %>%
left_join(df_countries, by = c("iso3" = "iso3"))


map_df <- world %>%
left_join(df_2015)

# Use a color scale that
pal <- colorNumeric("viridis", domain = map_df$Rank, reverse = TRUE)

map_df %>%
leaflet(options = leafletOptions(minZoom = 2)) %>%
addProviderTiles(provider = "CartoDB.Positron") %>%

addPolygons(
label = ~ paste(Rank, name_long),
opacity = 0.3,
color = ~ pal(Rank)
) %>%
addLegend(
"bottomleft",
pal = pal,
values = ~ Rank,
title = "Overall Rank of Six <br> World Governance Indicators",
opacity = 0.5
)

Pan, zoom and hover for Country Ranking

It would make sense to create a shiny app for this data. It’s on the list


Feel free to contact me if you have any questions or work offers

Share Comments
comments powered by Disqus