NB This is a revamp of a previous post
The World Bank is a great source of economic and social data. A couple of years ago, I used it to create a shiny app which still appears to work!
Earlier this year, the World Bank launched a new open data platform which collates more than 2000 indices from 40+ data sources many external to the World Bank
Even better, there is now an R package data360r to access the API and deliver data in a tidy format. Reg Onglao is its author and maintainer and she has written an introductory blog post on the World Bank Site. There are further examples on github. The package also provides access to Governance Global Practice indices and the World Bank Group’s Trade & Competitiveness Global Practice data
Rummage around on the World Bank site for access to thousands of reports and explanations for how the data is collected
Let’s take it for a ride. These are packages that I will use
library(tidyverse)
library(plotly)
library(leaflet)
library(stringr)
library(sf)
library(viridisLite)
library(data360r)
metadata
The get_metadata360() function downloads the requested TC/Govdata360 metadata, including:
- ‘datasets’ == Lists metadata for all datasets.
- ‘indicators’ == Lists metadata for all indicators. Does not return actual data.
- ‘countries’ == Lists metadata for all countries and regions.
This will give us a good handle on what is available
Datasets
In addition to World Bank datasets, many others have been incorporated into the API. Click on any link to get to a web page of the relevant organization
df_datasets <- get_metadata360(metadata_type = 'datasets')
df_datasets %>%
#mutate(sourceLink = paste0("<a href=', link, "\" target=\"_blank\">", source, '</a>"))
mutate(sourceLink = paste0("<a href=", link, ">", source, "</a>")) %>%
arrange(source) %>%
select(dataset_id = id, sourceLink) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
escape = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)
Indicators
## get all and
df_indicators <- get_metadata360(metadata_type = "indicators")
glimpse(df_indicators)
## Observations: 3,738
## Variables: 16
## $ id <int> 940, 40001, 3386, 415, 1026, 1841, 915, 3385,...
## $ name <chr> "GDP growth (annual %)", "Ease of Doing Busin...
## $ dataset <chr> "World Development Indicators", "Doing Busine...
## $ valueType <chr> "value", "dtf", "rank", "rank", "value", "val...
## $ datasetId <int> 56, 52, 65, 52, 56, 56, 56, 65, 56, 53, 56, 5...
## $ defaultViz <chr> "line_chart", "line_chart", "line_chart", "li...
## $ doNotUseViz <list> [<"spider_chart", "slope_graph", "chord_diag...
## $ rank <chr> "1.001", "1.001", "1.001", "1.001", "1.001", ...
## $ definition <chr> "Annual percentage growth rate of GDP at mark...
## $ units <chr> "Annual %", "Distance to Frontier", "Rank", "...
## $ subindicatorType <chr> "Value", "Distance to Frontier", "Rank", "Ran...
## $ timeframes <list> [<"1961", "1962", "1963", "1964", "1965", "1...
## $ byProduct <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ byPartner <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL...
## $ periodicity <chr> "annually", "annually", "annually", "annually...
## $ dateRange <chr> "1961-2016", "2016-2018", "2012-2016", "2015-...
df_indicators %>%
select(id, name, valueType, dateRange) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)
There are more than 2500 indicators. There are 16 variables in all including a lengthy definition. The table above is an easy way of searching For instance, entering “tourism” will show that some eg the 2469* datasets include a handy reprocessing of the same underlying data
Countries
The countries data includes some interesting basic data. One thing new to me is that the geo variable (providing lng and lat for each capital city) is a data.frame but in View() form they are split into geo.lat & geo.lng
I have used the leaflet package to map capital cities
df_countries <- get_metadata360(metadata_type = 'countries')
glimpse(df_countries)
## Observations: 217
## Variables: 13
## $ id <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ iso2 <chr> "AF", "AL", "DZ", "AS", "AD", "AO", "AG", "AR", "A...
## $ iso3 <chr> "AFG", "ALB", "DZA", "ASM", "AND", "AGO", "ATG", "...
## $ name <chr> "Afghanistan", "Albania", "Algeria", "American Sam...
## $ region <chr> "SAS", "ECS", "MEA", "EAS", "ECS", "SSF", "LCN", "...
## $ adminRegion <chr> "SAS", "ECA", "MNA", "EAP", NA, "SSA", NA, NA, "EC...
## $ incomeLevel <chr> "LIC", "UMC", "UMC", "UMC", "HIC", "UMC", "HIC", "...
## $ lendingType <chr> "IDX", "IBD", "IBD", "LNX", "LNX", "IBD", "IBD", "...
## $ capitalCity <chr> "Kabul", "Tirane", "Algiers", "Pago Pago", "Andorr...
## $ geo <data.frame> c("34.52280", "41.33170", "36.73970", "-14....
## $ aggregate <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, F...
## $ sids <lgl> FALSE, FALSE, FALSE, NA, NA, FALSE, TRUE, FALSE, F...
## $ landlocked <lgl> TRUE, FALSE, FALSE, NA, NA, FALSE, FALSE, FALSE, T...
df_countries %>%
leaflet() %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addCircles(
lng = ~ geo$lng,
lat = ~ geo$lat,
label = ~ paste0(capitalCity, ", ", name)
)
In addition to the get_metadata360() function, there are also `get_resources360() and search_360() functions which to help explore the datasets
Example
Let’s take a look at World Governance Indicators. Clicking on the link in the table will lead you to details of methodology etc. but for purposes of illustration of the package, the image below gives a broad overview
We can now utilize the war-horse, get_data360 function with the appropriate id, 51. We can save a bit of processing by calling for the data in ‘long’, tidy format
df <- get_data360(dataset_id = 51, output_type = "long")
tail(df,10)
## Country ISO3 Country Name dataset_id Indicator
## 1: ZWE Zimbabwe 51 Rule of Law
## 2: ZWE Zimbabwe 51 Rule of Law
## 3: ZWE Zimbabwe 51 Rule of Law
## 4: ZWE Zimbabwe 51 Rule of Law
## 5: ZWE Zimbabwe 51 Rule of Law
## 6: ZWE Zimbabwe 51 Voice and Accountability
## 7: ZWE Zimbabwe 51 Voice and Accountability
## 8: ZWE Zimbabwe 51 Voice and Accountability
## 9: ZWE Zimbabwe 51 Voice and Accountability
## 10: ZWE Zimbabwe 51 Voice and Accountability
## Subindicator Type Period Observation
## 1: Estimate 2015 -1.3490863
## 2: Lower Bound 2015 2.8708134
## 3: Percentile Rank 2015 6.2500000
## 4: Standard Error 2015 0.1384398
## 5: Upper Bound 2015 11.4832535
## 6: Estimate 2015 -1.1926939
## 7: Lower Bound 2015 9.8039217
## 8: Percentile Rank 2015 15.2709360
## 9: Standard Error 2015 0.1214361
## 10: Upper Bound 2015 21.0784321
As can be seen, for each country/year combo there are sub-indicators which provides both the estimate (over a range -2.5 to +2.5) for each indicator, a degree of its potential error and equivalent data for its rank. So here for example, at best, Zimbabwe would fall in the lowest 11% for Rule of Law in 2015
Let’s see how this ‘Rule of Law’ estimate varies over time
# previous - tTHEY HAVE CHANGED iNDICATOR NAMES
# df %>%
# filter(`Country Name`=="Zimbabwe"&Indicator=="Rule of law score (-2.5 to 2.5)"&`Subindicator Type`=="Estimate") %>%
# mutate(Year=as.integer(as.character(Period))) %>%
# plot_ly(x=~Year,y=~Observation,
# hoverinfo="text",
# text="") %>%
# add_lines()
df %>%
filter(`Country Name`=="Zimbabwe"&Indicator=="Rule of Law"&`Subindicator Type`=="Estimate") %>%
mutate(Year=as.integer(as.character(Period))) %>%
plot_ly(x=~Year,y=~Observation,
hoverinfo="text",
text="") %>%
add_lines()
Zimbabwe has never been famed for it’s governance but after increasing poverty in the 1990’s opposition both to President Mugabe’s rule and proposals to further extend his power grew. This was met by governement intimidation so that by 2009 Zimbabwe was in the lowest 1 percentile for this indicator
Overall Indicators
The indicators are likely to be closely correlated but let’s just group them together for the latest year available 2015 This requires a fair amount of data carpentry
# needs sorting
# create a distinct data.frame as it will be used for both table and map
# df_2015 <- df %>%
# # Reduce to required data
# filter(Period == "2015" & `Subindicator Type` == "Estimate") %>%
# # Remove extraneous info
# mutate(Indicator = str_sub(Indicator, 1, -21)) %>%
# # amend name and limit data
# select(Country = `Country Name`, Indicator, Observation, iso3 = `Country ISO3`) %>%
# # transform to wide format
# spread(Indicator, Observation) %>%
# # Calculatea an overall score, reduce field lengths and rank countries accordingly
# mutate(meanIndicator = rowMeans(.[3:8], na.rm = TRUE)) %>%
# mutate_if(is.numeric, round, 2) %>%
# arrange(desc(meanIndicator)) %>%
# mutate(Rank = row_number()) %>%
# arrange(Rank)
# create a distinct data.frame as it will be used for both table and map
df_2015 <- df %>%
# Reduce to required data
filter(Period == "2015" & `Subindicator Type` == "Estimate") %>%
# Remove extraneous info (no longer required)
#mutate(Indicator = str_sub(Indicator, 1, -21)) %>%
# amend name and limit data
select(Country = `Country Name`, Indicator, Observation, iso3 = `Country ISO3`) %>%
# transform to wide format
spread(Indicator, Observation) %>%
# Calculatea an overall score, reduce field lengths and rank countries accordingly
mutate(meanIndicator = rowMeans(.[3:8], na.rm = TRUE)) %>%
mutate_if(is.numeric, round, 2) %>%
arrange(desc(meanIndicator)) %>%
mutate(Rank = row_number()) %>%
arrange(Rank)
df_2015 %>%
select(Rank, Country, `Control of Corruption`:`Rule of Law`, Voice = `Voice and Accountability`) %>%
DT::datatable(
class = 'compact stripe hover row-border order-column',
rownames = FALSE,
options = list(
paging = TRUE,
searching = TRUE,
info = FALSE
)
)
N.B. +2.5 is best score, -2.5 lowest
It would also be interesting to map this
We do not currently have any country shapefile information in the data but this can be provided from the sf package. We can then join it via df_countries iso3 filed to our indicator data.frame, df2015
They are colored with the viridis scale which renders well in balck and white and can be perceived by those who have most common forms of color blindness
# collect shapefiles and rename variable for easier join
world <-
st_read(system.file("shapes/world.gpkg", package = "spData")) %>%
mutate(iso2 = as.character(iso_a2))
## Reading layer `wrld.gpkg' from data source `C:\Users\Andrew\Documents\R\win-library\3.4\spData\shapes\world.gpkg' using driver `GPKG'
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
# class(world) [1] "sf" "data.frame"
df_2015 <- df_2015 %>%
left_join(df_countries, by = c("iso3" = "iso3"))
map_df <- world %>%
left_join(df_2015)
# Use a color scale that
pal <- colorNumeric("viridis", domain = map_df$Rank, reverse = TRUE)
map_df %>%
leaflet(options = leafletOptions(minZoom = 2)) %>%
addProviderTiles(provider = "CartoDB.Positron") %>%
addPolygons(
label = ~ paste(Rank, name_long),
opacity = 0.3,
color = ~ pal(Rank)
) %>%
addLegend(
"bottomleft",
pal = pal,
values = ~ Rank,
title = "Overall Rank of Six <br> World Governance Indicators",
opacity = 0.5
)
Pan, zoom and hover for Country Ranking
It would make sense to create a shiny app for this data. It’s on the list
Feel free to contact me if you have any questions or work offers