Twitter, political ideology & the 115th US Senate

In this post, we consider some fairly recent studies conducted by folks at the Washington Post and the Pew Research Center that investigate the relationship between political ideology — as estimated by voting behavior/DW-Nominate scores (Poole and Rosenthal 1985) — and social media usage among lawmakers in the US Congress.

Some findings:

  • More conservative and more liberal lawmakers tend to have more Facebook followers than moderate lawmakers (Hughes and Lam 2017).
  • Political ideology scores derived from the news sources lawmakers share via Twitter (eg, articles from the nytimes.com, foxnews.com, etc.) strongly correlate with DW-Nominate scores based in voting behavior (Eady et al. 2018).
  • Moderate members of Congress are more likely to share local (as opposed to national) news sources (via Facebook) than more conservative/liberal members of Congress (Van Kessel and Hughes 2018).

So, here we demonstrate an R-based/Twitter-based framework for replicating/approximating some of these findings (albeit with less methodological rigor), with a focus on the 115th US Senate. Results presented here nicely align with previous findings.

library(Rvoteview)#devtools::install_github("voteview/Rvoteview")
library(tidyverse)
library(ggthemes)
library(ggrepel)#devtools::install_github("slowkow/ggrepel")
library(DT)
library(ggridges)

Congressional data sources

DW-Nominate scores1 for every lawmaker in the history of the US Congress, as well as the details of every Congressional roll call, are made available by the folks at VoteView in a variety of formats, including via the R package RVoteview (Poole and Rosenthal 1985; Boche et al. 2018). The package ships with a host of search functionality; here, we use the member_search function to acquire Senator details & DW-Nominate scores for the 115th US Senate.

sen115 <- Rvoteview:: member_search(chamber= 'Senate', congress = 115)

The plot below summarizes political ideologies in the 115th Senate as estimated by DW-Nominate D1 & D2 scores; labeled are some of the more ideologically extreme/well-known/moderate Senators. Focusing on D1, then, Elizabeth Warren votes most progressively and Rand Paul the most conservatively.

sens <- c('Flake', 'Warren', 'Collins', 'Paul', 'Manchin',
          'Merkley', 'Harris', 'Murkowski', 'Udall',
          'Jones', 'Shelby', 'Sanders', 'Cruz', 'Rubio' )

sen115 %>%
  ggplot(aes(x=nominate_dim1, 
             y=nominate_dim2, 
             label = bioname)) +
          annotate("path",
               x=cos(seq(0,2*pi,length.out=300)),
               y=sin(seq(0,2*pi,length.out=300)),
               color='gray',
               size = .25) +
  geom_point(aes(color = as.factor(party_code)), 
             size= 2, 
             shape= 17) +
    geom_text_repel(
    data  = subset(sen115, bioname %in% toupper(sens)),
    nudge_y =  0.025,
    segment.color = "grey50",
    direction = "y",
    hjust = 0, 
    size = 2 ) +
  scale_color_stata() + 
  theme_fivethirtyeight() +
  theme(legend.position = 'none', 
        plot.title = element_text(size=12),
        axis.title = element_text())+
  xlab('DW-Nominate D1') + ylab('DW-Nominate D2') +
  labs(title="DW-Nominate Plot for the 115th US Senate") 

For additional details about the 115th Congress, we access a collection of resources made available at CivilServiceUSA, which includes information regarding age, race, and religion, as well as Twitter & Facebook handles (and a host of other variables).

library(jsonlite)
sen_url <- 'https://raw.githubusercontent.com/CivilServiceUSA/us-senate/master/us-senate/data/us-senate.json'

senate_dets <-  jsonlite::fromJSON(url(sen_url)) %>%
  mutate(twitter_handle = ifelse(twitter_handle == 'SenJeffFlake', 'JeffFlake', twitter_handle)) %>%
  mutate (twitter_handle = tolower(twitter_handle)) %>%
  rename (bioguide_id = bioguide) %>%
  left_join(sen115 %>% 
              filter(congress == 115) %>%
              select(bioguide_id, party_code, nominate_dim1)) %>%
  mutate(party = ifelse(party == 'independent', 'democrat', party))

The table below summarizes some of the details/info from this data set for a sample of Senators in the 115th Congress.

set.seed(199)
senate_dets %>% select(last_name, twitter_handle, date_of_birth, class, religion) %>%
  sample_n(5) %>%
    DT::datatable(options = list(pageLength = 5,dom = 't', scrollX = TRUE),
                  rownames = FALSE, width="100%", escape=FALSE) 

Scraping tweets via rtweet

With Twitter handles in tow, we can now gather some tweets. There are different paradigms for working with/scraping tweets using R; here, we provide a simple walk-through using the rtweet package, which has a lovely online vignette available here.

library(rtweet)

The rtweet::get_timeline function is a super simple function for gathering the n-most recent tweets for a given user (or set of users) based on Twitter handles; below we gather the 2,000 most recent tweets for each Senator.

senate_tweets <- rtweet::get_timeline(
  senate_dets$twitter_handle, n = 2000)

I am not exactly sure about query limits; the above query returns ~200K tweets quickly and problem-free. Example output from the twitter scrape:

set.seed(999)
senate_tweets %>%
  select(created_at, screen_name, text) %>% #followers_count, 
  sample_n(5) %>%
    DT::datatable(options = list(pageLength = 5, dom = 't', scrollX = TRUE), 
                  rownames = FALSE, width="100%", escape=FALSE) 

The plot below summarizes the number of tweets returned from our Twitter query by date of creation. So, most tweets have been generated in the last couple of years; older tweets are presumably tweets from less prolific Senate tweeters.

library(scales)
senate_tweets %>%
  mutate(created_at = as.Date(gsub(' .*$', '', created_at))) %>%
  group_by(created_at) %>%
  summarize(n=n()) %>%
  ggplot(aes(x=created_at, group = 1)) +
  geom_line(aes(y=n),
            size=.5, 
            color = 'steelblue') +
  theme_fivethirtyeight()+
  theme(plot.title = element_text(size=12)) + 
  labs(title="Senator tweets by date") +
  scale_x_date(labels = scales::date_format("%m-%Y"))

Twitter followers & political ideology

First, then, we take a quick look at the relationship between political ideology scores and number of Twitter followers. The results from our call to Twitter include the number of followers for each US Senator; so, we simply need to join the Twitter data with the DW-Nominate D1 scores obtained via VoteView.

senate_summary <- senate_tweets %>%
  group_by(screen_name) %>%
  summarize(followers = mean(followers_count)) %>%
  rename(twitter_handle = screen_name) %>%
  mutate (twitter_handle = tolower(twitter_handle)) %>%
  left_join(senate_dets %>%
              select(bioguide_id, twitter_handle, party, party_code, nominate_dim1)) %>%
  filter(complete.cases(.))

A portion of our summary table is presented below:

For illustrative purposes, we treat the New England Independents who caucus with Democrats (ie, King-ME and Sanders-VT) as Democrats in the figure below.

senate_summary %>%
  ggplot(aes(nominate_dim1, log(followers), color = as.factor(party)))+ 
  geom_point()+ #
  geom_smooth(method="lm", se=T) +
  ggthemes::scale_color_stata()+
  ggthemes::theme_fivethirtyeight()+
  theme(legend.position = "none", 
        plot.title = element_text(size=12),
        axis.title = element_text())+
  xlab('DW-Nominate D1') + ylab('log (Twitter Followers)') +
  labs(title="DW-Nominate scores & log (Twitter followers)") 

So, as Hughes and Lam (2017) have previously demonstrated in the case of Facebook followers, more conservative and more liberal lawmakers in the Senate tend to have stronger Twitter followings in comparison to their more moderate colleagues. (Note that we do not control for constituency size, ie, state populations.)

Shared tweets as ideology

Next, we investigate the relationship between political ideologies based on Senate roll calls (ie, DW-Nominate scores) and political ideologies as estimated using news media that Senators share on their Twitter feed.

General overview for estimating political ideology via social media feeds:

  • Extract URLs of news media shared by each US Senator via Twitter,
  • Build a vector space model (VSM) to represent each Senator in terms of the domain/frequency of shared news media, and
  • Apply classical scaling to a cosine-based similarity matrix to view Twitter-based political ideologies in two-dimensional space.

The first two steps are based on the the Washington Post methodology described here; the third, an alternative (non-Bayesian) approach to measuring similarity among constituent vectors of a VSM. We walk through each of these steps next.

Retrieving shared URLs

Results from our call to rtweet::get_timeline() include a column containing shared URLs — below we filter our tweet data set to only tweets containing shared (& non-quoted) URLs.

x <- senate_tweets%>%
  filter (!media_type %in% 'photo' & !is.na(urls_url) & is_quote %in% 'FALSE') %>%
  select(screen_name, urls_url, urls_t.co) %>%
  unnest(urls_url, urls_t.co) %>%
  mutate(urls = gsub('/.*$','',urls_url)) 

x1 <- x %>% filter(grepl('com$|org$|gov$|net$|gop$|edu$|us$|uk$', urls)) 
#Proper URLs

Many of these URLs have been shortened, and (in many cases) require manual unshortening. This issue can be addressed via the get_url function from the SocialMediaMineR package. Note that the manual unshortening process can be a time consuming one.

library(SocialMediaMineR)
y <- x %>% filter(!grepl('com$|org$|gov$|net$|gop$|edu$|us$|uk$', urls)) %>%
  #Shortened URLs
  mutate(urls = SocialMediaMineR::get_url(urls_t.co)) %>%
  mutate(urls = gsub ('(http)(s)?(://)(www\\.)?','', urls)) %>%
  mutate(urls = gsub('/.*$','', urls))

Then we manually exclude some of the more frequent non-news sources (eg, personal & government websites). Presumably less frequent non-news sources remain, which we do not worry about too much here.

senate_domains <- bind_rows(x1, y) %>%
  filter(!grepl('facebook|twitter|youtube|instagram|twimg|error|gov$|tumblr|google|Error|maggiehassan|tammybaldwin|catherinecortez|actblue|pscp|tinyurl|joniforiowa|heart|medium', urls)) %>%
  mutate(urls = tolower(urls))

Of the ~200K tweets, then, ~30K include shared links. The figure below illustrates the top 50 shared domain names among Senators in the 115th Congress.

senate_domains %>%
  data.frame()%>%
  group_by(urls)%>%
  summarise(freq = n())%>%
  top_n(50,freq) %>%
  ggplot(aes(x=reorder(urls, freq), y = freq)) + 
  geom_point(size=2, color = 'steelblue') +
  geom_text(aes(label=urls), #
            size=3, 
            hjust = 0, nudge_y = 20) +
  coord_flip()+
  ylim(0, 2500) +
  theme_fivethirtyeight() +
  theme(axis.text.y=element_blank(),
        plot.title = element_text(size=11)) +
  labs(title="50 most tweeted web domains by US Senators") 

Vector-space representation of shared URLs

Next, we build a VSM to represent each Senator in terms of the domain/frequency of news media shared via Twitter. Based on the data structure from above, this transformation is fairly straightforward — amounting to some aggregation and casting.

sen_url_mat <- 
  senate_domains %>%
  group_by(screen_name, urls) %>%
  summarize (freq = n()) %>%
  filter(freq > 1) %>%
  spread(screen_name, freq)%>%
  replace(is.na(.), 0)  %>%
  ungroup() 

A portion of the matrix is presented below — clearly some intuitive variation in vector composition as a function of political affiliation (and whether or not your name is Jeff Flake).

x <- c('urls', 'JeffFlake', 'SenTedCruz', 
       'SenSchumer', 'SenWhitehouse')
y <- c('nytimes.com', 'thehill.com', 
       'wsj.com', 'usatoday.com', 'foxnews.com', 
       'bloomberg.com', 'politico.com')

sen_url_mat[sen_url_mat$urls %in% y, x]%>%
    DT::datatable(options = list(pageLength = 7, dom = 't', scrollX = TRUE), 
                  rownames = FALSE, width="100%", escape=FALSE) 

A multi-dimensional model

Based on this matrix, we can measure the similarity of the news media sharing habits (and presumably political ideologies) of US Senators by building a cosine-based similarity matrix with the lsa::cosine function.

library(lsa)
sim_mat <- 
  sen_url_mat %>%
  select(2:ncol(sen_url_mat)) %>%
  data.matrix()%>%
  lsa::cosine(.)

We then transform this similarity matrix into two-dimensional Euclidean space via classical scaling and the cmdscale function.

sm_ids <-
  cmdscale(1-sim_mat, eig = TRUE, k = 2)$points %>% 
  data.frame() %>%
  mutate (twitter_handle = tolower(rownames(sim_mat))) %>%
  left_join(senate_dets)

Results are summarized in the plot below. Per the spatial model, Senators with similar Tweet sharing habits are positioned proximally in 2D space. As can be noted, the x-axis (Tweet D1) does a nice job distinguishing party affiliation among US Senators.

sm_ids %>%
  ggplot(aes(X1,X2)) +
  geom_text(aes(label=paste0(last_name, '-', state_code),col=party), #
            size=2.5, 
            check_overlap = TRUE)+
  scale_colour_stata() + theme_fivethirtyeight() +
  theme(legend.position = "none",
        plot.title = element_text(size=12),
        axis.title = element_text())+
  xlab('Tweet D1') + ylab('Tweet D2')+ 
  xlim(-.4,.4)+ ylim(-.4,.4)+
  labs(title="US Senators from 115th Congress in tweet domain space")

What underlies variation along the y-axis (Tweet D2) is less intuitive. Senators occupying higher D2 space tend to be more moderate; they also include several of the vulnerable red state Democrats up for re-election in November.

It could be that this dimension reflects a “national versus local” news sharing preference among Senators (per findings presented in Van Kessel and Hughes (2018)). See postscript for additional support for this particular interpretation.

Comparing ideology scores

For a bit of validation, we compare VoteView’s DW-Nominate D1 scores and our Twitter-based D1 scores; below we join the two data sets.

senate_summary_twids <- 
  senate_summary %>%
  left_join(sm_ids %>% 
              select(bioguide_id, 
                     last_name, 
                     state_code, 
                     X1, X2)) %>%
  drop_na(X1) 

In the figure below, Twitter-based (D1) ideology scores are plotted as a function of DW-Nominate (D1) ideology scores. As the plot attests, the Twitter-based scores align quite nicely with the roll call-based scores. Senators for whom Twitter scores and DW-Nominate scores are most disparate have been labeled.

set.seed(799)
senate_summary_twids %>%
  ggplot(aes(nominate_dim1, X1, label = paste0(last_name,'-', state_code)))+
  geom_point(aes(color = party))+
  geom_smooth(method="loess", se=T, color = 'darkgrey') +
  geom_text_repel(
    data = subset(senate_summary_twids, party == 'republican' & X1 < 0),
    nudge_y = -0.025,
    segment.color = "grey50",
    direction = "y",
    hjust = 0, 
    size = 2.5 ) +
  geom_text_repel(
    data  = subset(senate_summary_twids, party == 'democrat' & X1 > 0),
    nudge_y =  0.025,
    segment.color = "grey50",
    direction = "y",
    hjust = 0, 
    size = 2.5 ) +
  scale_colour_stata() + theme_fivethirtyeight() +
  theme(legend.position = "none",
        plot.title = element_text(size=12),
        axis.title = element_text())+
  ylab('Tweeter Ideology D1') + xlab('DW-Nominate D1')+
  labs(title = "DW-Nominate scores vs. Twitter ideology scores")

So, Democrats vote and share tweets largely in lockstep. John Tester and Maggie Hassan have been labeled because they have positive Twitter 1D scores, but both are more moderate Democrats with news media sharing habits that are not especially anomalous. Republicans, on the other hand, have several Senators in their ranks that share news media quite differently than they vote.

Perhaps most notable is Jeff Flake; the Senator from Arizona is the third most conservative voter in the Senate but shares news media via Twitter like a moderate Democrat. (See Eady et al. 2018 for a similar observation.) Bob Corker, a more moderate Republican, shares Tweets like a card-carrying Democrat. Both Republican Senators are not seeking re-election in 2018, and both have been willing to be publically critical of 45.

cor((senate_summary_twids$X1),senate_summary_twids$nominate_dim1)
## [1] 0.8246283

Summary

So, a bit of a copycat post (for R users) demonstrating some super neat methods developed by folks at Pew Research and the Washington Post. The rtweet package is quite lovely, and facilitates a very clean interaction with Twitter’s APIs. Lots of fun to be had applying social media methodologies/analyses to the investigation of political ideology. Per usual, results presented here should be taken with a grain of salt, as our data set is relatively small. See references for more methodologically thorough approaches.

Postscript: News media ideologies

Quickly. If we flip the VSM we used to estimate the tweet-based ideology of US Senators on its head, such that each news source is represented as a vector of shared tweets by Senator, we can get an estimate of the political ideology of the news sources included in our Tweet data set. (Using more/less the same code from above.)

The plot below summarizes a two-dimensional solution. D1 seems to intuitively capture the liberal-conservative leanings of news sources. A national-local distinction seems to underly variation along D2. See this Pew Research viz for a slightly different approach with ~comparable results (at least along D1).

Resources

Boche, Adam, Jeffrey B Lewis, Aaron Rudkin, and Luke Sonnet. 2018. “The New Voteview. Com: Preserving and Continuing Keith Poole’s Infrastructure for Scholars, Students and Observers of Congress.” Public Choice. Springer, 1–16.

Eady, Gregory, Jan Zilinsky, Jonathan Nagler, and Joshua Tucker. 2018. “Trying to Understand How Jeff Flake Is Leaning? We Analyzed His Twitter Feed — and Were Surprised.” Washington Post, October. https://www.washingtonpost.com/news/monkey-cage/wp/2018/10/05/trying-to-understand-how-jeff-flake-is-leaning-we-analyzed-his-twitter-feed-and-were-surprised/?utm_term=.34e5b2a28490.

Hughes, Adam, and Onyi Lam. 2017. “Highly Ideological Members of Congress Have More Facebook Followers Than Moderates Do.” Pew Research Center, August. http://www.pewresearch.org/fact-tank/2017/08/21/highly-ideological-members-of-congress-have-more-facebook-followers-than-moderates-do/.

Poole, Keith T, and Howard Rosenthal. 1985. “A Spatial Model for Legislative Roll Call Analysis.” American Journal of Political Science. JSTOR, 357–84.

Van Kessel, Patrick, and Adam Hughes. 2018. “Moderates in Congress Go Local on Facebook More Than the Most Ideological Members.” Pew Research Center, July. http://www.pewresearch.org/fact-tank/2018/07/25/moderates-in-congress-go-local-on-facebook-more-than-the-most-ideological-members/.


  1. We have discussed some of the details of this scoring procedure in a previous post.

Share