So, lots of linguistic variation happening in real time.
pandemic, and more recently (
coronavirus pandemic. For sure, these expressions are not proper synonyms – each refer to different “aspects” of the virus.
coronavirus ~ virus.
covid19 ~ disease.
pandemic ~ social/epi. Here, we take a super quick look at how this variation in reference is materializing on Twitter among the 535 voting members of the United States Congress since January 2020.
Accidentially updated on 5.26.20.
First things first, we obtain Twitter handles and some relevant biographical details (here, political affiliation) for the 100 US Senators and the 435 members of the House of Representatives from the unitedstates project.
library(tidyverse) leg_dets <- 'https://theunitedstates.io/congress-legislators/legislators-current.csv' twitters <- read.csv((url(leg_dets)), stringsAsFactors = FALSE) %>% #filter(type == 'rep') %>% # & twitter!='' rename (state_abbrev = state, district_code = district)
Then we scrape the last 1000 tweets for each of the 535 members of congress using the
rtweet package. Here, we are just trying to get all tweets from 2020 – 1,000 is overkill. We exclude re-tweets. The scraping process takes roughly an hour or so.
congress_tweets <- rtweet::get_timeline( twitters$twitter, n = 1000, check = FALSE) %>% mutate(created_at = as.Date(gsub(' .*$', '', created_at))) %>% filter(is_quote == 'FALSE' & is_retweet == 'FALSE' & created_at >= '2020-01-01' & display_text_width > 0) # setwd("/home/jtimm/jt_work/GitHub/data_sets") # saveRDS(congress_tweets, 'cong2020_tweets_tif.rds')
Then we join the two data sets. And calculate total tweets generated by members of Congress by party affiliation in 2020.
congress_tweets1 <- congress_tweets %>% mutate(twitter = toupper(screen_name)) %>% select(status_id, created_at, twitter, text) %>% inner_join(twitters %>% mutate(twitter = toupper(twitter))) all_tweets <- congress_tweets1 %>% group_by(created_at, party) %>% summarise(ts = n()) %>% rename(date = created_at)
The figure below summarizes total tweets by party affiliation since the first of the year. Donald Trump presented his State of the Union address on February 5th, hence the spike in activity. There seems to be a slight upward trend in total tweets – perhaps one that is more prevalent among Democrats – presumably in response to the Coronavirus.
Also, Democrats do tweet more, but they also have numbers at present. And it seems that members of Congress put their phones down a bit on the weekends.
all_tweets %>% filter(party != 'Independent') %>% # Justin Amash & Bernie Sanders & Angus King ggplot() + geom_line(aes(x = date, y= ts, color = party ), size = 1.25) + theme_minimal() + ggthemes::scale_color_stata() + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ scale_x_date(date_breaks = '1 week', date_labels = "%b %d") + theme(legend.position = 'bottom') + labs(title = 'Total congressional tweets by party affiliation')
Patterns of variation over time
The table below details the first attestation of each referring expression in our 2020 Congressional Twitter corpus.
coronavirus hit the scene on 1-17, followed by
pandemic on 1-22,
coronavirus pandemic on 2-11, and
covid19 on 2-12 – the name for the disease coined by the World Health Organization on 2-11.
covid_tweets %>% group_by(covid_gram) %>% filter(date == min(date)) %>% arrange(date) %>% select(covid_gram, date, twitter) %>% knitr::kable()
Lastly, we consider a proportional perspective on reference to
2019 NOVEL CORONAVIRUS. Instead of total tweets, the denominator here becomes overall references to
2019 NOVEL CORONAVIRUS on Twitter among members of Congress.
The figure below, then, illustrates daily probability distributions for forms used to reference
2019 NOVEL CORONAVIRUS.
covid19 has slowly become the majority form on Twitter –
coronavirus has become less and less prevalent. One explanation is that the effects of the virus in the US, ie, the disease, have become more prevalent and, hence, the proper use of the referring expression
covid19. Another explanation is that
covid19 is shorter orthographically, and in the character-counting world of Twitter, a more efficient way to express the notion
2019 NOVEL CORONAVIRUS. An empirical question for sure.
x1 <- covid_tweets %>% filter(date > '2020-2-25') %>% group_by(date, covid_gram) %>% #,party, summarize(n = n()) %>% mutate(per = n/sum(n)) x2 <- x1 %>% ggplot(aes(x=date, y=per, fill = covid_gram))+ geom_bar(alpha = 0.65, stat = 'identity', width = .9) + # theme_minimal() + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ theme(legend.position = "none")+ ggthemes::scale_fill_economist() + scale_x_date(date_breaks = '7 days', date_labels = "%b %d") + labs(title = 'Referring to 2019 NOVEL CORONAVIRUS', subtitle = 'Among US Senators & House Representatives') x2 + annotate(geom="text", x = c(rep(as.Date('2020-3-22'), 4)), y = c(.05, .35, .6, .8), label = c('pandemic', 'covid19', 'coronavirus pandemic', 'coronavirus'), size = 4, color = 'black')
So, a weekend & social distancing. Caveats galore, but for folks interested in language change & innovation & the establishment of convention in a community of speakers, something to keep an eye on.