newsflash package
Polymath, Bob Rudis (aka hrbrmstr) has recently released the newsflash package which is a “set of tools to Work with the Internet Archive and GDELT Television Explorer”"
In a recent blog post, based on a gdelt project creator article, he details the coverage of Hillary Clinton’s email server woes, with the, unsurprising, fact that FOX News spent more time on the issue than other broadcasters
I will take a slightly different tack by, firstly, looking at some of the major news stories of March in the USA
We just need to load a few packages
library(newsflash)
library(plotly)
library(tidyverse)
Let’s just look at four issues that have impinged on President Trump recently
- His Tax Returns
- The Travel Ban
- The New Healthcare Bill
- Wire Tapping Claims
query_tv() is the function that does most of the work. In it, you enter primary_keyword e.g ‘Trump’ and then context_keywords which are words that appear within four sentences of mention of the keyword. This is obviously a bit hit and miss and limited to 50 characters in length before throwing an error. For the tax returns I chose these words: tax,returns, leak,income,bill. Feel free to amend in adapting the code
There is a maximum of 2,500 results per query. Bob’s code explains how to manouvre around this. However, as I am just looking at a seven day period - which also supplies results in 30 minute chunks - for just the National Networks that does not pose an issue. We do, however, need to ensure the correct time span an use the list_networks() function for this
# list networks available and date range of information
list_networks() %>%
as.tbl() %>%
select(-keyword) %>%
head(10)
## # A tibble: 54 x 3
## keyword network
## <chr> <chr>
## 1 NATIONAL All National Networks
## 2 BLOOMBERG Bloomberg
## 3 CNBC CNBC
## 4 CNN CNN
## 5 FBC FOX Business
## 6 FOXNEWSW FOX News
## 7 MSNBC MSNBC
## 8 INTERNATIONAL All International Networks
## 9 BBCNEWSSEG BBC News
## 10 NATIONALDISCONTINUED All Discontinued National Networks
## 11 ALJAZAM Aljazeera America
## 12 AFFNETALL All Affiliate Networks
## 13 AFFNET_ABC ABC Affiliate Stations
## 14 AFFNET_CBS CBS Affiliate Stations
## 15 AFFNET_FOX FOX Affiliate Stations
## 16 AFFNET_MYTV MYTV Affiliate Stations
## 17 AFFNET_NBC NBC Affiliate Stations
## 18 AFFNET_PBS PBS Affiliate Stations
## 19 AFFMARKALL All Affiliate Markets
## 20 AFFMARKET_Boston Boston Affiliate Stations
## 21 AFFMARKET_Cedar Rapids Cedar Rapids Affiliate Stations
## 22 AFFMARKET_Charlotte Charlotte Affiliate Stations
## 23 AFFMARKET_Cincinnati Cincinnati Affiliate Stations
## 24 AFFMARKET_Cleveland Cleveland Affiliate Stations
## 25 AFFMARKET_Colorado Springs Colorado Springs Affiliate Stations
## 26 AFFMARKET_Columbia Columbia Affiliate Stations
## 27 AFFMARKET_Dakota Dunes SD Dakota Dunes SD Affiliate Stations
## 28 AFFMARKET_Daytona Beach Daytona Beach Affiliate Stations
## 29 AFFMARKET_Denver Denver Affiliate Stations
## 30 AFFMARKET_Des Moines Des Moines Affiliate Stations
## 31 AFFMARKET_Durham Durham Affiliate Stations
## 32 AFFMARKET_Goldsboro Goldsboro Affiliate Stations
## 33 AFFMARKET_Greenville Greenville Affiliate Stations
## 34 AFFMARKET_Hampton Hampton Affiliate Stations
## 35 AFFMARKET_Las Vegas Las Vegas Affiliate Stations
## 36 AFFMARKET_Lynchburg Lynchburg Affiliate Stations
## 37 AFFMARKET_Miami Miami Affiliate Stations
## 38 AFFMARKET_Newport KY Newport KY Affiliate Stations
## 39 AFFMARKET_Norfolk Norfolk Affiliate Stations
## 40 AFFMARKET_Orlando Orlando Affiliate Stations
## 41 AFFMARKET_Philadelphia Philadelphia Affiliate Stations
## 42 AFFMARKET_Portsmouth Portsmouth Affiliate Stations
## 43 AFFMARKET_Pueblo Pueblo Affiliate Stations
## 44 AFFMARKET_Raleigh Raleigh Affiliate Stations
## 45 AFFMARKET_Reno Reno Affiliate Stations
## 46 AFFMARKET_Roanoke Roanoke Affiliate Stations
## 47 AFFMARKET_San Francisco San Francisco Affiliate Stations
## 48 AFFMARKET_Shaker Heights Shaker Heights Affiliate Stations
## 49 AFFMARKET_Sioux City Sioux City Affiliate Stations
## 50 AFFMARKET_St. Petersburg St. Petersburg Affiliate Stations
## 51 AFFMARKET_Tampa Tampa Affiliate Stations
## 52 AFFMARKET_Virginia Beach Virginia Beach Affiliate Stations
## 53 AFFMARKET_Washington DC Washington DC Affiliate Stations
## 54 AFFMARKET_Waterloo Waterloo Affiliate Stations
## # ... with 1 more variables: date_range <chr>
## # A tibble: 10 x 2
## network date_range
## <chr> <chr>
## 1 All National Networks (See individual networks for dates)
## 2 Bloomberg (12/5/2013 - 5/24/2017)
## 3 CNBC (7/2/2009 - 5/23/2017)
## 4 CNN (7/2/2009 - 5/24/2017)
## 5 FOX Business (8/20/2012 - 5/24/2017)
## 6 FOX News (7/16/2011 - 5/24/2017)
## 7 MSNBC (7/2/2009 - 5/24/2017)
## 8 All International Networks (See individual networks for dates)
## 9 BBC News (1/1/2017 - 5/24/2017)
## 10 All Discontinued National Networks (See individual networks for dates)
Data from most the major outlets is available within a day or two
The query returns list of 4 tibbles and for this exercise I will be looking at the timeline
# timespan="custom" is required if you are entering specific dates.
#filter_network = "NATIONAL" is the default value required. Each query takes a couple of seconds
tax <- query_tv("trump",context_keywords="tax, returns, leak, income, bill", timespan="custom",
start_date="2017-03-01", end_date="2017-03-31")
# create a tibble and add a value for an additonal subject field
tax_df <- tax$timeline %>%
as.tbl() %>%
mutate(subject="tax")
head(tax_df)
## # A tibble: 6 x 6
## date_start date_end date_resolution station value
## <dttm> <dttm> <chr> <chr> <int>
## 1 2017-03-01 2017-03-01 23:59:59 day Bloomberg 88
## 2 2017-03-01 2017-03-01 23:59:59 day CNBC 95
## 3 2017-03-01 2017-03-01 23:59:59 day CNN 98
## 4 2017-03-01 2017-03-01 23:59:59 day FOX Business 74
## 5 2017-03-01 2017-03-01 23:59:59 day FOX News 181
## 6 2017-03-01 2017-03-01 23:59:59 day MSNBC 70
## # ... with 1 more variables: subject <chr>
The result shows the number of times the keywords combo appears on a particular station over the given time spread
We can now do th same process for other subjects, combine and summarize the data
travel <- query_tv("trump",context_keywords="travel, ban, Muslim, religious, ruling, supreme", timespan="custom", start_date="2017-03-01", end_date="2017-03-31")
travel_df <- travel$timeline %>%
as.tbl() %>%
mutate(subject="travel")
health <- query_tv("trump",context_keywords="healthcare, insurance, budget, affordable", timespan="custom", start_date="2017-03-01", end_date="2017-03-31")
health_df <- health$timeline %>%
as.tbl() %>%
mutate(subject="health")
wiretap <- query_tv("trump",context_keywords="wiretap, tower, surveillance, tweet", timespan="custom", start_date="2017-03-01", end_date="2017-03-31")
wiretap_df <- wiretap$timeline %>%
as.tbl() %>%
mutate(subject="wiretap")
news <- rbind(tax_df,travel_df,health_df,wiretap_df)
Let’s first look at which of these issues proved of most importance over the month. This has the HUGE proviso that the context_keywords are the most appropriate distinct words. There are plots for both hourly and daily data. You may want to zoom in and, in particular, click on the legend to toggle subjects in and out of chart
# For some reason although it is supposedly in 30 minute batches only one value shows up per hour?
news %>%
group_by(date_start, subject) %>%
summarize(count=sum(value)) %>%
ungroup() %>%
plot_ly(x=~date_start,y=~count, color=~subject) %>%
add_lines() %>%
config(displayModeBar = F,showLink = F)
The daily may be more appropriate
# need to create a day
news %>%
mutate(date=as.Date(date_start)) %>%
group_by(date, subject) %>%
summarize(count=sum(value)) %>%
ungroup() %>%
plot_ly(x=~date,y=~count, color=~subject) %>%
add_bars() %>%
layout(barmode = "stack") %>%
config(displayModeBar = F,showLink = F)
The daily figures suggest that less political shows/subjects might be covered at the weekend (e.g 11th/12th)
President Trump’s wiretap tweet was originally sent on Mar 4th but investigations on its veracity reverberated throughout this period
Likewise, the second anticipated executive order for a travel ban was made on the 6th. I am not sure why it seemed to gain more traction on the 10th/11th but received a further boost on the 16th as judges in Hawaii and Maryland blocked the ban
The tax returns may have been conflated with the federal budget but has clearly been of great importance whilst the health bill gained in traction (at least as far as Trump’s association with it) towards the middle of the time period as it’s implications sunk in
Now let’s hae a look at how the different broadcasters prioritized coverage
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Email