What's News?

newsflash package

Polymath, Bob Rudis (aka hrbrmstr) has recently released the newsflash package which is a “set of tools to Work with the Internet Archive and GDELT Television Explorer”"

In a recent blog post, based on a gdelt project creator article, he details the coverage of Hillary Clinton’s email server woes, with the, unsurprising, fact that FOX News spent more time on the issue than other broadcasters


I will take a slightly different tack by, firstly, looking at some of the major news stories of March in the USA

We just need to load a few packages

library(newsflash)
library(plotly)
library(tidyverse)

Let’s just look at four issues that have impinged on President Trump recently

  • His Tax Returns
  • The Travel Ban
  • The New Healthcare Bill
  • Wire Tapping Claims

query_tv() is the function that does most of the work. In it, you enter primary_keyword e.g ‘Trump’ and then context_keywords which are words that appear within four sentences of mention of the keyword. This is obviously a bit hit and miss and limited to 50 characters in length before throwing an error. For the tax returns I chose these words: tax,returns, leak,income,bill. Feel free to amend in adapting the code

There is a maximum of 2,500 results per query. Bob’s code explains how to manouvre around this. However, as I am just looking at a seven day period - which also supplies results in 30 minute chunks - for just the National Networks that does not pose an issue. We do, however, need to ensure the correct time span an use the list_networks() function for this

# list networks available and date range of information
list_networks() %>% 
  as.tbl() %>% 
  select(-keyword) %>% 
  head(10)
## # A tibble: 56 x 3
##                       keyword                             network
##                         <chr>                               <chr>
##  1                   NATIONAL               All National Networks
##  2                  BLOOMBERG                           Bloomberg
##  3                       CNBC                                CNBC
##  4                        CNN                                 CNN
##  5                        FBC                        FOX Business
##  6                   FOXNEWSW                            FOX News
##  7                      MSNBC                               MSNBC
##  8              INTERNATIONAL          All International Networks
##  9                 BBCNEWSSEG                            BBC News
## 10       NATIONALDISCONTINUED  All Discontinued National Networks
## 11                    ALJAZAM                   Aljazeera America
## 12                        ALL           All Combined All Networks
## 13                        ALL                                <NA>
## 14                  AFFNETALL              All Affiliate Networks
## 15                 AFFNET_ABC              ABC Affiliate Stations
## 16                 AFFNET_CBS              CBS Affiliate Stations
## 17                 AFFNET_FOX              FOX Affiliate Stations
## 18                AFFNET_MYTV             MYTV Affiliate Stations
## 19                 AFFNET_NBC              NBC Affiliate Stations
## 20                 AFFNET_PBS              PBS Affiliate Stations
## 21                 AFFMARKALL               All Affiliate Markets
## 22           AFFMARKET_Boston           Boston Affiliate Stations
## 23     AFFMARKET_Cedar Rapids     Cedar Rapids Affiliate Stations
## 24        AFFMARKET_Charlotte        Charlotte Affiliate Stations
## 25       AFFMARKET_Cincinnati       Cincinnati Affiliate Stations
## 26        AFFMARKET_Cleveland        Cleveland Affiliate Stations
## 27 AFFMARKET_Colorado Springs Colorado Springs Affiliate Stations
## 28         AFFMARKET_Columbia         Columbia Affiliate Stations
## 29  AFFMARKET_Dakota Dunes SD  Dakota Dunes SD Affiliate Stations
## 30    AFFMARKET_Daytona Beach    Daytona Beach Affiliate Stations
## 31           AFFMARKET_Denver           Denver Affiliate Stations
## 32       AFFMARKET_Des Moines       Des Moines Affiliate Stations
## 33           AFFMARKET_Durham           Durham Affiliate Stations
## 34        AFFMARKET_Goldsboro        Goldsboro Affiliate Stations
## 35       AFFMARKET_Greenville       Greenville Affiliate Stations
## 36          AFFMARKET_Hampton          Hampton Affiliate Stations
## 37        AFFMARKET_Las Vegas        Las Vegas Affiliate Stations
## 38        AFFMARKET_Lynchburg        Lynchburg Affiliate Stations
## 39            AFFMARKET_Miami            Miami Affiliate Stations
## 40       AFFMARKET_Newport KY       Newport KY Affiliate Stations
## 41          AFFMARKET_Norfolk          Norfolk Affiliate Stations
## 42          AFFMARKET_Orlando          Orlando Affiliate Stations
## 43     AFFMARKET_Philadelphia     Philadelphia Affiliate Stations
## 44       AFFMARKET_Portsmouth       Portsmouth Affiliate Stations
## 45           AFFMARKET_Pueblo           Pueblo Affiliate Stations
## 46          AFFMARKET_Raleigh          Raleigh Affiliate Stations
## 47             AFFMARKET_Reno             Reno Affiliate Stations
## 48          AFFMARKET_Roanoke          Roanoke Affiliate Stations
## 49    AFFMARKET_San Francisco    San Francisco Affiliate Stations
## 50   AFFMARKET_Shaker Heights   Shaker Heights Affiliate Stations
## 51       AFFMARKET_Sioux City       Sioux City Affiliate Stations
## 52   AFFMARKET_St. Petersburg   St. Petersburg Affiliate Stations
## 53            AFFMARKET_Tampa            Tampa Affiliate Stations
## 54   AFFMARKET_Virginia Beach   Virginia Beach Affiliate Stations
## 55    AFFMARKET_Washington DC    Washington DC Affiliate Stations
## 56         AFFMARKET_Waterloo         Waterloo Affiliate Stations
## # ... with 1 more variables: date_range <chr>
## # A tibble: 10 x 2
##                               network                          date_range
##                                 <chr>                               <chr>
##  1              All National Networks (See individual networks for dates)
##  2                          Bloomberg             (12/5/2013 - 6/26/2017)
##  3                               CNBC              (7/2/2009 - 6/26/2017)
##  4                                CNN              (7/2/2009 - 6/26/2017)
##  5                       FOX Business             (8/20/2012 - 6/26/2017)
##  6                           FOX News             (7/16/2011 - 6/26/2017)
##  7                              MSNBC              (7/2/2009 - 6/26/2017)
##  8         All International Networks (See individual networks for dates)
##  9                           BBC News              (1/1/2017 - 6/26/2017)
## 10 All Discontinued National Networks (See individual networks for dates)

Data from most the major outlets is available within a day or two

The query returns list of 4 tibbles and for this exercise I will be looking at the timeline

# timespan="custom" is required if you are entering specific dates.  
#filter_network = "NATIONAL" is the default value required. Each query takes a couple of seconds

tax <- query_tv("trump",context_keywords="tax, returns, leak, income, bill", timespan="custom", 
                start_date="2017-03-01", end_date="2017-03-31") 

# create a tibble and add a value for an additonal subject field
tax_df <- tax$timeline %>% 
  as.tbl() %>% 
  mutate(subject="tax")

head(tax_df)
## # A tibble: 6 x 6
##   date_start            date_end date_resolution      station value
##       <dttm>              <dttm>           <chr>        <chr> <int>
## 1 2017-03-01 2017-03-01 23:59:59             day    Bloomberg    88
## 2 2017-03-01 2017-03-01 23:59:59             day         CNBC    95
## 3 2017-03-01 2017-03-01 23:59:59             day          CNN    98
## 4 2017-03-01 2017-03-01 23:59:59             day FOX Business    74
## 5 2017-03-01 2017-03-01 23:59:59             day     FOX News   181
## 6 2017-03-01 2017-03-01 23:59:59             day        MSNBC    70
## # ... with 1 more variables: subject <chr>

The result shows the number of times the keywords combo appears on a particular station over the given time spread

We can now do th same process for other subjects, combine and summarize the data

travel <- query_tv("trump",context_keywords="travel, ban, Muslim, religious, ruling, supreme", timespan="custom",  start_date="2017-03-01", end_date="2017-03-31")


travel_df <- travel$timeline %>% 
  as.tbl() %>% 
  mutate(subject="travel")

health <- query_tv("trump",context_keywords="healthcare, insurance, budget, affordable", timespan="custom",  start_date="2017-03-01", end_date="2017-03-31")


health_df <- health$timeline %>% 
  as.tbl() %>% 
  mutate(subject="health")

wiretap <- query_tv("trump",context_keywords="wiretap, tower, surveillance, tweet", timespan="custom",  start_date="2017-03-01", end_date="2017-03-31")


wiretap_df <- wiretap$timeline %>% 
  as.tbl() %>% 
  mutate(subject="wiretap")

news <- rbind(tax_df,travel_df,health_df,wiretap_df)

Let’s first look at which of these issues proved of most importance over the month. This has the HUGE proviso that the context_keywords are the most appropriate distinct words. There are plots for both hourly and daily data. You may want to zoom in and, in particular, click on the legend to toggle subjects in and out of chart

# For some reason although it is supposedly in 30 minute batches only one value shows up per hour?
news %>% 
  group_by(date_start, subject) %>% 
  summarize(count=sum(value)) %>% 
  ungroup() %>% 
  plot_ly(x=~date_start,y=~count, color=~subject) %>% 
  add_lines() %>% 
  config(displayModeBar = F,showLink = F)

The daily may be more appropriate

# need to create a day
news %>% 
  mutate(date=as.Date(date_start)) %>% 
  group_by(date, subject) %>% 
  summarize(count=sum(value)) %>% 
  ungroup() %>% 
  plot_ly(x=~date,y=~count, color=~subject) %>% 
  add_bars() %>% 
  layout(barmode = "stack") %>%
  config(displayModeBar = F,showLink = F)

The daily figures suggest that less political shows/subjects might be covered at the weekend (e.g 11th/12th)

President Trump’s wiretap tweet was originally sent on Mar 4th but investigations on its veracity reverberated throughout this period

Likewise, the second anticipated executive order for a travel ban was made on the 6th. I am not sure why it seemed to gain more traction on the 10th/11th but received a further boost on the 16th as judges in Hawaii and Maryland blocked the ban

The tax returns may have been conflated with the federal budget but has clearly been of great importance whilst the health bill gained in traction (at least as far as Trump’s association with it) towards the middle of the time period as it’s implications sunk in


Now let’s have a look at how the different broadcasters prioritized coverage … TBC

Share