We are going to go through several steps to do topic modeling, including:
library(topicmodels) #topic modeling
library(quanteda) #preprocessing and building dtm/dfm
library(dplyr)
library(ldatuning) #find topic numbers
library(tidytext)
library(ggplot2)
library(seededlda)
We will use a dataset about online media coverage about the 2020 BLM protests with 2 opposing stances (support vs opposition). First, let’s look at our data:
df <- read.csv("/Users/chautong/Desktop/My Research/R/Teaching_Workshop_Cornell/BLM_stm.csv")
colnames(df)
## [1] "X" "Num_Queries" "query" "Link"
## [5] "Type" "Support_Oppose" "text"
df$text <- as.character(df$text) #change to character
corpus <- corpus(df) #build "corpus"
tokens <- tokens(corpus,remove_numbers=T,remove_punct=T, remove_url=T, remove_symbols=T, include_docvars=T)
tokens <- tokens_tolower(tokens)
tokens <- tokens_wordstem(tokens)
stopwords(language = "en")
## [1] "i" "me" "my" "myself" "we"
## [6] "our" "ours" "ourselves" "you" "your"
## [11] "yours" "yourself" "yourselves" "he" "him"
## [16] "his" "himself" "she" "her" "hers"
## [21] "herself" "it" "its" "itself" "they"
## [26] "them" "their" "theirs" "themselves" "what"
## [31] "which" "who" "whom" "this" "that"
## [36] "these" "those" "am" "is" "are"
## [41] "was" "were" "be" "been" "being"
## [46] "have" "has" "had" "having" "do"
## [51] "does" "did" "doing" "would" "should"
## [56] "could" "ought" "i'm" "you're" "he's"
## [61] "she's" "it's" "we're" "they're" "i've"
## [66] "you've" "we've" "they've" "i'd" "you'd"
## [71] "he'd" "she'd" "we'd" "they'd" "i'll"
## [76] "you'll" "he'll" "she'll" "we'll" "they'll"
## [81] "isn't" "aren't" "wasn't" "weren't" "hasn't"
## [86] "haven't" "hadn't" "doesn't" "don't" "didn't"
## [91] "won't" "wouldn't" "shan't" "shouldn't" "can't"
## [96] "cannot" "couldn't" "mustn't" "let's" "that's"
## [101] "who's" "what's" "here's" "there's" "when's"
## [106] "where's" "why's" "how's" "a" "an"
## [111] "the" "and" "but" "if" "or"
## [116] "because" "as" "until" "while" "of"
## [121] "at" "by" "for" "with" "about"
## [126] "against" "between" "into" "through" "during"
## [131] "before" "after" "above" "below" "to"
## [136] "from" "up" "down" "in" "out"
## [141] "on" "off" "over" "under" "again"
## [146] "further" "then" "once" "here" "there"
## [151] "when" "where" "why" "how" "all"
## [156] "any" "both" "each" "few" "more"
## [161] "most" "other" "some" "such" "no"
## [166] "nor" "not" "only" "own" "same"
## [171] "so" "than" "too" "very" "will"
tokens <- tokens_remove(tokens, stopwords("english"))
dfm <- dfm(tokens)
dfm #sparse
## Document-feature matrix of: 228 documents, 12,981 features (96.48% sparse) and 6 docvars.
## features
## docs view app nikki carvaj cnn vice presid mike penc declin
## text1 1 1 1 1 2 2 4 1 6 1
## text2 2 0 0 0 0 4 18 0 0 4
## text3 0 0 0 0 0 0 0 0 0 0
## text4 1 0 0 0 0 0 2 0 0 0
## text5 1 0 0 0 0 3 3 3 7 0
## text6 0 0 0 0 0 1 1 1 1 0
## [ reached max_ndoc ... 222 more documents, reached max_nfeat ... 12,971 more features ]
dfm_trim <- dfm_trim(dfm, min_termfreq = 5, max_docfreq = 225) #removing terms that occurred less than 5 times and occurred over 225 documents. This is to remove too infrequent/frquent words. You can adjust by using proportions.
Let’s say we want to run topic modeling with 10 topics, using Gibbs sampling. (more on Gibbs sampling: https://medium.com/@tomar.ankur287/topic-modeling-using-lda-and-gibbs-sampling-explained-49d49b3d1045)
k <- 11
control_LDA_Gibbs <- list(alpha = 50/k, estimate.beta = TRUE, #the starting value for alpha is 50/k suggested by Griffiths & Steyvers (2004)
verbose = 0, prefix = tempfile(),
save = 0, keep = 0, #no information is printed during the algorithm; no immediate results are saved
seed = 999, #random seed for reproducibility
nstart = 1, #number of repeated runs with random initializations
best = TRUE, #returns only the best one model
delta = 0.1, #specifies the parameter of the prior distribution of the term distribution over topics. The default is 0.1
iter = 2000, #iterations
burnin = 100, #the first 100 iterations are discarded
thin = 2000) #then every 2000th iteration is returned
lda <- LDA(dfm_trim, k=k, method="Gibbs", control = control_LDA_Gibbs)
lda
## A LDA_Gibbs topic model with 11 topics.
We can visualize the terms that are most common within each topic. beta numbers are assigned to each word in a topic. If a beta score is higher, that word matters more to that topic. In other words, when a message uses that word, it is more likely to be categorized into the affiliated cluster.
topics<- tidy(lda, matrix="beta") #extract beta with tidy() function
top_terms <- topics %>% group_by(topic) %>% #group the words by topic
top_n(10, beta) %>% # identify the top 10 words
ungroup() %>% #remove the grouping variable
arrange(topic, -beta)
top_terms %>% mutate(term = reorder(term, beta)) %>%
ggplot(aes(term, beta, fill = factor(topic))) +
geom_col(show.legend = FALSE) +
facet_wrap(~ topic, scales = "free") +
coord_flip()
We can also see the topic distributions (gamma values) for each document. Each document has a gamma score for each topic. Remember each document is considered to be a mixture of topics (mixed-membership model). For example, we can say article 1 is 14% of topic 4, 5% of topic 5, 24% of topic 6, .. and so on. This suggests that a document’s content is predominantly in one topic as opposed to another.
topics_doc <- tidy(lda, matrix="gamma")
topics_doc$document <- gsub("text", "", topics_doc$document)
topics_doc$document <- as.numeric(topics_doc$document)
topics_doc_sp <- tidyr::spread(topics_doc, topic,gamma)
topics_doc_sp
## # A tibble: 228 × 12
## document `1` `2` `3` `4` `5` `6` `7` `8` `9`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.0258 0.0258 0.0360 0.149 0.0565 0.241 0.0496 0.285 0.0531
## 2 2 0.00971 0.0201 0.437 0.0705 0.0219 0.0184 0.00942 0.290 0.0867
## 3 3 0.0232 0.0335 0.0198 0.0558 0.0284 0.0610 0.0198 0.677 0.0438
## 4 4 0.0162 0.0673 0.365 0.0826 0.0485 0.0298 0.0315 0.225 0.0690
## 5 5 0.0292 0.0206 0.0572 0.105 0.0507 0.305 0.0421 0.243 0.0831
## 6 6 0.0210 0.0161 0.0210 0.0900 0.0137 0.418 0.0334 0.302 0.0432
## 7 7 0.0214 0.0162 0.0715 0.0704 0.0298 0.202 0.0319 0.419 0.0319
## 8 8 0.0398 0.0398 0.0260 0.174 0.0777 0.250 0.0743 0.133 0.0743
## 9 9 0.0419 0.112 0.0419 0.103 0.0770 0.121 0.173 0.130 0.0594
## 10 10 0.0255 0.0543 0.116 0.0897 0.0255 0.172 0.0322 0.362 0.0410
## # … with 218 more rows, and 2 more variables: 10 <dbl>, 11 <dbl>
Choosing the number of topics: Here in the example I randomly chose 10, but we could run this function first to figure out several options to explore.
Package ldatuning uses 4 metrics (from Griffiths 2004, Cao Juan et al 2009, Arun et al 2010, and Deveaud et al 2014) to select the number of topics for LDA topic modeling.
#result <- FindTopicsNumber(
# dfm_trim,
# topics = seq(from = 5, to = 20, by = 1),
# metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014"),
# method = "Gibbs",
# control = list(seed = 77),
# mc.cores = 2L,
# verbose = TRUE
#)
The seededlda package can be used if you want to predefine topics in LDA using a dictionary of "seed words. Here, I’m using a liwc dictionary of words that indicate moral values.
download.file("https://moralfoundations.org/wp-content/uploads/files/downloads/moral%20foundations%20dictionary.dic", tf <- tempfile())
dictliwc <- dictionary(file = tf, format = "LIWC")
print(dictliwc)
## Dictionary object with 11 key entries.
## - [HarmVirtue]:
## - amity, benefit*, care, caring, compassion*, defen*, empath*, guard*, peace*, preserve, protect*, safe*, secur*, shelter, shield, sympath*
## - [HarmVice]:
## - abandon*, abuse*, annihilate*, attack*, brutal*, cruel*, crush*, damag*, destroy, detriment*, endanger*, exploit, exploited, exploiting, exploits, fight*, harm*, hurt*, impair, kill [ ... and 15 more ]
## - [FairnessVirtue]:
## - balance*, constant, egalitar*, equable, equal*, equity, equivalent, evenness, fair, fair-*, fairly, fairmind*, fairness, fairplay, homologous, honest*, impartial*, justice, justifi*, justness [ ... and 6 more ]
## - [FairnessVice]:
## - bias*, bigot*, discriminat*, dishonest, disproportion*, dissociate, exclud*, exclusion, favoritism, inequitable, injust*, preference, prejud*, segregat*, unequal*, unfair*, unjust*, unscrupulous
## - [IngroupVirtue]:
## - ally, cadre, cliqu*, cohort, collectiv*, communal, commune*, communis*, communit*, comrad*, devot*, familial, families, family, fellow*, group, guild, homeland*, insider, joint [ ... and 9 more ]
## - [IngroupVice]:
## - abandon*, apostasy, apostate, betray*, deceiv*, deserted, deserter*, deserting, disloyal*, enem*, foreign*, immigra*, imposter, individual*, jilt*, miscreant, renegade, sequester, spy, terroris* [ ... and 3 more ]
## [ reached max_nkey ... 5 more keys ]
We also need to specify k, the number of topics is already determined by the number of keys in the dictionary. Now, we can fit the seeded LDA model and match the features/terms in the dfm with the dictionary.
tmod_slda <- textmodel_seededlda(dfm_trim, dictionary = dictliwc)
terms(tmod_slda, 10)
## HarmVirtue HarmVice FairnessVirtue FairnessVice IngroupVirtue
## [1,] "protect" "violenc" "equal" "injustic" "communiti"
## [2,] "guard" "kill" "justifi" "disproportion" "group"
## [3,] "secur" "violent" "fair" "bias" "nation"
## [4,] "safeti" "brutal" "constant" "unjust" "member"
## [5,] "care" "war" "honest" "unfair" "nationwid"
## [6,] "safe" "damag" "black" "exclud" "homeland"
## [7,] "defend" "attack" "live" "discriminatori" "fellow"
## [8,] "defens" "harm" "matter" "prejudic" "patriot"
## [9,] "benefit" "destroy" "peopl" "bigot" "joint"
## [10,] "shield" "fight" "polic" "say" "devot"
## IngroupVice AuthorityVirtue AuthorityVice PurityVirtue PurityVice
## [1,] "terrorist" "law" "protest" "church" "sick"
## [2,] "foreign" "order" "riot" "clean" "wanton"
## [3,] "enemi" "leader" "disobedi" "pure" "dirt"
## [4,] "abandon" "control" "rioter" "decentr" "sicken"
## [5,] "feder" "legal" "rebellion" "decent" "disgust"
## [6,] "use" "respect" "obstruct" "offic" "exploit"
## [7,] "said" "class" "dissent" "polic" "ruin"
## [8,] "polic" "duti" "lawless" "said" "statu"
## [9,] "offic" "leadership" "disrespect" "video" "may"
## [10,] "right" "mother" "defianc" "charg" "getti"
## MoralityGeneral
## [1,] "good"
## [2,] "wrong"
## [3,] "moral"
## [4,] "bad"
## [5,] "correct"
## [6,] "legal"
## [7,] "worth"
## [8,] "ideal"
## [9,] "offend"
## [10,] "character"
# assign topics from seeded LDA as a document-level variable to the dfm
dfm_trim$topic2 <- topics(tmod_slda)
# cross-table of the topic frequency
table(dfm_trim$topic2, df$Support_Oppose)
##
## Oppose Support
## HarmVirtue 30 12
## HarmVice 8 11
## FairnessVirtue 18 2
## FairnessVice 0 9
## IngroupVirtue 8 12
## IngroupVice 3 13
## AuthorityVirtue 9 8
## AuthorityVice 29 9
## PurityVirtue 3 12
## PurityVice 8 1
## MoralityGeneral 7 16
We can use STM to determine if the distribution of topics differ as a function of the source, i.e., in this case, the stance (Support or Oppose). See more: https://www.structuraltopicmodel.com/
Some additional terminologies are:
#install.packages("stm")
library(stm)
library(igraph)
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:purrr':
##
## compose, simplify
## The following object is masked from 'package:tidyr':
##
## crossing
## The following object is masked from 'package:tibble':
##
## as_data_frame
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
## Building corpus...
## Converting to Lower Case...
## Removing punctuation...
## Removing stopwords...
## Remove Custom Stopwords...
## Removing numbers...
## Stemming...
## Creating Output...
Note that the top words now are different from the LDA output, as this time, we estimated the influence of metadata on topic distributions.
PrevFitQuery <- stm(documents = out$documents, vocab = out$vocab,
K = 11, prevalence =~ Support_Oppose + query,
max.em.its = 75, data = out$meta,
init.type = "Spectral", seed = 100)
## Beginning Spectral Initialization
## Calculating the gram matrix...
## Finding anchor words...
## ...........
## Recovering initialization...
## ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 1 (approx. per word bound = -6.459)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 2 (approx. per word bound = -6.353, relative change = 1.636e-02)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 3 (approx. per word bound = -6.308, relative change = 7.078e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 4 (approx. per word bound = -6.286, relative change = 3.490e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 5 (approx. per word bound = -6.274, relative change = 1.906e-03)
## Topic 1: loot, properti, riot, violenc, polic
## Topic 2: polic, citi, budget, crime, spend
## Topic 3: post, polic, polit, social, call
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, american, racial, like, support
## Topic 6: statu, remov, confeder, histori, american
## Topic 7: polic, racism, kill, white, system
## Topic 8: polic, protest, offic, floyd, citi
## Topic 9: protest, civil, right, law, bill
## Topic 10: protest, demonstr, violenc, feder, report
## Topic 11: polic, law, offic, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 6 (approx. per word bound = -6.267, relative change = 1.119e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 7 (approx. per word bound = -6.263, relative change = 6.967e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 8 (approx. per word bound = -6.260, relative change = 4.470e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 9 (approx. per word bound = -6.258, relative change = 2.985e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 10 (approx. per word bound = -6.257, relative change = 2.117e-04)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, crime, spend
## Topic 3: post, call, social, famili, left
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, american, racial, like, among
## Topic 6: statu, remov, confeder, histori, american
## Topic 7: polic, kill, racism, system, white
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, offic, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 11 (approx. per word bound = -6.256, relative change = 1.551e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 12 (approx. per word bound = -6.255, relative change = 1.161e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 13 (approx. per word bound = -6.254, relative change = 9.030e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 14 (approx. per word bound = -6.254, relative change = 7.321e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 15 (approx. per word bound = -6.253, relative change = 6.124e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, crime, depart
## Topic 3: post, call, left, social, presid
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, confeder, histori, american
## Topic 7: polic, kill, american, crime, system
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, offic, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 16 (approx. per word bound = -6.253, relative change = 5.264e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 17 (approx. per word bound = -6.253, relative change = 4.605e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 18 (approx. per word bound = -6.253, relative change = 3.986e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 19 (approx. per word bound = -6.252, relative change = 3.425e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 20 (approx. per word bound = -6.252, relative change = 2.940e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: post, call, left, presid, social
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, confeder, american, histori
## Topic 7: polic, american, kill, crime, communiti
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, offic, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 21 (approx. per word bound = -6.252, relative change = 2.601e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 22 (approx. per word bound = -6.252, relative change = 2.333e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 23 (approx. per word bound = -6.252, relative change = 2.166e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 24 (approx. per word bound = -6.252, relative change = 2.006e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 25 (approx. per word bound = -6.251, relative change = 1.831e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: call, post, presid, left, social
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, american, confeder, histori
## Topic 7: polic, american, crime, communiti, kill
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, offic, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 26 (approx. per word bound = -6.251, relative change = 1.715e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 27 (approx. per word bound = -6.251, relative change = 1.644e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 28 (approx. per word bound = -6.251, relative change = 1.594e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 29 (approx. per word bound = -6.251, relative change = 1.555e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 30 (approx. per word bound = -6.251, relative change = 1.524e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: call, post, presid, left, white
## Topic 4: polic, offic, protest, feder, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, american, confeder, histori
## Topic 7: polic, american, crime, communiti, violenc
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, organ, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 31 (approx. per word bound = -6.251, relative change = 1.566e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 32 (approx. per word bound = -6.251, relative change = 1.568e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 33 (approx. per word bound = -6.251, relative change = 1.594e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 34 (approx. per word bound = -6.251, relative change = 1.612e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 35 (approx. per word bound = -6.250, relative change = 1.649e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: call, presid, post, left, white
## Topic 4: polic, offic, feder, protest, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, american, confeder, histori
## Topic 7: polic, american, crime, communiti, violenc
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, law, organ, communiti, enforc
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 36 (approx. per word bound = -6.250, relative change = 1.670e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 37 (approx. per word bound = -6.250, relative change = 1.680e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 38 (approx. per word bound = -6.250, relative change = 1.687e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 39 (approx. per word bound = -6.250, relative change = 1.590e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 40 (approx. per word bound = -6.250, relative change = 1.706e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, post, white, left
## Topic 4: polic, offic, feder, protest, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, remov, american, confeder, histori
## Topic 7: polic, american, crime, communiti, violenc
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, organ, communiti, law, includ
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 41 (approx. per word bound = -6.250, relative change = 1.737e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 42 (approx. per word bound = -6.250, relative change = 1.681e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 43 (approx. per word bound = -6.250, relative change = 1.647e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 44 (approx. per word bound = -6.250, relative change = 1.601e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 45 (approx. per word bound = -6.249, relative change = 1.536e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, post, left
## Topic 4: polic, offic, feder, protest, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, confeder, histori
## Topic 7: polic, american, crime, communiti, violenc
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: polic, organ, communiti, law, includ
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 46 (approx. per word bound = -6.249, relative change = 1.487e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 47 (approx. per word bound = -6.249, relative change = 1.453e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 48 (approx. per word bound = -6.249, relative change = 1.432e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 49 (approx. per word bound = -6.249, relative change = 1.365e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 50 (approx. per word bound = -6.249, relative change = 1.335e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, post, left
## Topic 4: polic, offic, feder, protest, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, histori, confeder
## Topic 7: polic, american, crime, violenc, communiti
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, state
## Topic 11: organ, polic, communiti, includ, help
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 51 (approx. per word bound = -6.249, relative change = 1.409e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 52 (approx. per word bound = -6.249, relative change = 1.217e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 53 (approx. per word bound = -6.249, relative change = 1.370e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 54 (approx. per word bound = -6.249, relative change = 1.407e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 55 (approx. per word bound = -6.249, relative change = 1.349e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, post, left
## Topic 4: polic, offic, feder, depart, law
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, histori, confeder
## Topic 7: polic, american, crime, violenc, communiti
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, violent
## Topic 11: organ, communiti, polic, help, includ
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 56 (approx. per word bound = -6.249, relative change = 1.351e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 57 (approx. per word bound = -6.248, relative change = 1.368e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 58 (approx. per word bound = -6.248, relative change = 1.357e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 59 (approx. per word bound = -6.248, relative change = 1.414e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 60 (approx. per word bound = -6.248, relative change = 1.374e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, post, left
## Topic 4: polic, offic, feder, law, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, civil, histori
## Topic 7: polic, american, crime, violenc, communiti
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, violent
## Topic 11: organ, communiti, help, support, polic
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 61 (approx. per word bound = -6.248, relative change = 1.399e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 62 (approx. per word bound = -6.248, relative change = 1.286e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 63 (approx. per word bound = -6.248, relative change = 1.259e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 64 (approx. per word bound = -6.248, relative change = 1.192e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 65 (approx. per word bound = -6.248, relative change = 1.153e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, say, trump
## Topic 4: polic, offic, feder, law, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, civil, histori
## Topic 7: polic, american, crime, violenc, kill
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, civil, right, law, public
## Topic 10: protest, demonstr, violenc, report, violent
## Topic 11: organ, communiti, help, support, work
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 66 (approx. per word bound = -6.248, relative change = 1.107e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 67 (approx. per word bound = -6.248, relative change = 1.161e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 68 (approx. per word bound = -6.248, relative change = 1.136e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 69 (approx. per word bound = -6.247, relative change = 1.133e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 70 (approx. per word bound = -6.247, relative change = 1.158e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: presid, call, white, say, trump
## Topic 4: polic, offic, feder, law, depart
## Topic 5: white, racial, american, like, among
## Topic 6: statu, american, remov, civil, histori
## Topic 7: polic, american, crime, violenc, kill
## Topic 8: polic, protest, offic, citi, floyd
## Topic 9: protest, right, civil, law, public
## Topic 10: protest, demonstr, violenc, report, violent
## Topic 11: organ, communiti, help, support, work
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 71 (approx. per word bound = -6.247, relative change = 1.135e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 72 (approx. per word bound = -6.247, relative change = 1.084e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 73 (approx. per word bound = -6.247, relative change = 1.051e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Model Converged
plot(PrevFitQuery, type = "summary", xlim = c(0, .4)) #plotting the summary model
Each STM has semantic coherence and exclusivity values associated with each topic. The topicQuality() function plots the values and labels each with topic numbers.
topicQuality(model = PrevFitQuery, documents = docs)
## [1] -22.28524 -43.71394 -26.06731 -28.15585 -36.24280 -43.86575 -26.18637
## [8] -19.85138 -25.60266 -21.39547 -30.20609
## [1] 9.074018 9.852134 8.883609 9.383724 9.868091 9.609326 8.635021 9.219973
## [9] 9.007736 9.589415 9.231538
You can also use a data-driven approach to automatically find the number of topics.
kResult <- searchK(out$documents, out$vocab, K=c(7,11), prevalence=~Support_Oppose + query,
data=meta)
## Beginning Spectral Initialization
## Calculating the gram matrix...
## Finding anchor words...
## .......
## Recovering initialization...
## ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 1 (approx. per word bound = -6.530)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 2 (approx. per word bound = -6.417, relative change = 1.728e-02)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 3 (approx. per word bound = -6.382, relative change = 5.482e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 4 (approx. per word bound = -6.365, relative change = 2.665e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 5 (approx. per word bound = -6.355, relative change = 1.583e-03)
## Topic 1: polic, protest, loot, citi, properti
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, system, american, polit
## Topic 4: polic, offic, protest, demonstr, street
## Topic 5: statu, civil, imag, remov, confeder
## Topic 6: white, american, racial, movement, support
## Topic 7: protest, demonstr, violenc, state, report
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 6 (approx. per word bound = -6.348, relative change = 1.067e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 7 (approx. per word bound = -6.343, relative change = 7.856e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 8 (approx. per word bound = -6.339, relative change = 6.031e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 9 (approx. per word bound = -6.336, relative change = 4.705e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 10 (approx. per word bound = -6.334, relative change = 3.681e-04)
## Topic 1: loot, polic, protest, riot, properti
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, american, system, work
## Topic 4: polic, offic, protest, floyd, demonstr
## Topic 5: civil, statu, right, remov, movement
## Topic 6: white, american, racial, like, democrat
## Topic 7: protest, demonstr, feder, violenc, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 11 (approx. per word bound = -6.332, relative change = 2.926e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 12 (approx. per word bound = -6.330, relative change = 2.402e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 13 (approx. per word bound = -6.329, relative change = 2.190e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 14 (approx. per word bound = -6.328, relative change = 2.108e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 15 (approx. per word bound = -6.326, relative change = 2.002e-04)
## Topic 1: loot, riot, properti, protest, polic
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, organ, american, work
## Topic 4: polic, offic, protest, floyd, citi
## Topic 5: civil, statu, right, american, movement
## Topic 6: white, american, racial, like, democrat
## Topic 7: protest, demonstr, feder, violenc, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 16 (approx. per word bound = -6.325, relative change = 1.756e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 17 (approx. per word bound = -6.324, relative change = 1.442e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 18 (approx. per word bound = -6.324, relative change = 1.164e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 19 (approx. per word bound = -6.323, relative change = 9.607e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 20 (approx. per word bound = -6.323, relative change = 8.169e-05)
## Topic 1: loot, riot, properti, protest, polic
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, organ, work, social
## Topic 4: polic, offic, protest, floyd, citi
## Topic 5: civil, right, statu, american, movement
## Topic 6: white, racial, american, like, democrat
## Topic 7: protest, demonstr, feder, violenc, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 21 (approx. per word bound = -6.322, relative change = 7.077e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 22 (approx. per word bound = -6.322, relative change = 6.219e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 23 (approx. per word bound = -6.321, relative change = 5.537e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 24 (approx. per word bound = -6.321, relative change = 5.001e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 25 (approx. per word bound = -6.321, relative change = 4.489e-05)
## Topic 1: loot, riot, properti, protest, violenc
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, organ, work, social
## Topic 4: polic, offic, protest, citi, floyd
## Topic 5: civil, right, statu, american, law
## Topic 6: white, racial, american, like, democrat
## Topic 7: protest, demonstr, feder, violenc, report
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 26 (approx. per word bound = -6.320, relative change = 3.873e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 27 (approx. per word bound = -6.320, relative change = 3.285e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 28 (approx. per word bound = -6.320, relative change = 2.845e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 29 (approx. per word bound = -6.320, relative change = 2.480e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 30 (approx. per word bound = -6.320, relative change = 2.183e-05)
## Topic 1: loot, riot, properti, protest, violenc
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, organ, work, social
## Topic 4: polic, offic, protest, citi, floyd
## Topic 5: civil, right, american, statu, law
## Topic 6: white, racial, american, like, democrat
## Topic 7: protest, demonstr, feder, violenc, report
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 31 (approx. per word bound = -6.320, relative change = 1.868e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 32 (approx. per word bound = -6.320, relative change = 1.620e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 33 (approx. per word bound = -6.320, relative change = 1.397e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 34 (approx. per word bound = -6.319, relative change = 1.245e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 35 (approx. per word bound = -6.319, relative change = 1.074e-05)
## Topic 1: loot, riot, properti, protest, violenc
## Topic 2: polic, citi, offic, law, budget
## Topic 3: polic, communiti, organ, work, social
## Topic 4: polic, offic, protest, citi, floyd
## Topic 5: civil, right, american, statu, law
## Topic 6: white, racial, american, like, democrat
## Topic 7: protest, demonstr, feder, violenc, report
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Model Converged
## Beginning Spectral Initialization
## Calculating the gram matrix...
## Finding anchor words...
## ...........
## Recovering initialization...
## ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 1 (approx. per word bound = -6.467)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 2 (approx. per word bound = -6.352, relative change = 1.780e-02)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 3 (approx. per word bound = -6.306, relative change = 7.200e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 4 (approx. per word bound = -6.285, relative change = 3.337e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 5 (approx. per word bound = -6.274, relative change = 1.804e-03)
## Topic 1: loot, properti, riot, violenc, polic
## Topic 2: polic, citi, budget, offic, depart
## Topic 3: polic, polit, american, work, social
## Topic 4: polic, offic, law, feder, depart
## Topic 5: statu, remov, confeder, histori, civil
## Topic 6: white, american, racial, like, race
## Topic 7: polic, racism, white, kill, crime
## Topic 8: offic, polic, protest, street, crowd
## Topic 9: right, civil, protest, law, polic
## Topic 10: polic, protest, offic, floyd, citi
## Topic 11: protest, demonstr, feder, violenc, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 6 (approx. per word bound = -6.267, relative change = 1.124e-03)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 7 (approx. per word bound = -6.262, relative change = 7.662e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 8 (approx. per word bound = -6.259, relative change = 5.535e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 9 (approx. per word bound = -6.256, relative change = 4.206e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 10 (approx. per word bound = -6.254, relative change = 3.330e-04)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, american, polit, polic, call
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, remov, confeder, histori, symbol
## Topic 6: white, racial, american, like, among
## Topic 7: polic, white, kill, crime, racism
## Topic 8: offic, polic, protest, street, show
## Topic 9: right, civil, protest, law, organ
## Topic 10: polic, protest, offic, floyd, citi
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 11 (approx. per word bound = -6.252, relative change = 2.676e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 12 (approx. per word bound = -6.251, relative change = 2.196e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 13 (approx. per word bound = -6.250, relative change = 1.837e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 14 (approx. per word bound = -6.249, relative change = 1.581e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 15 (approx. per word bound = -6.248, relative change = 1.406e-04)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, american, polit, presid, don’
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, remov, confeder, histori, symbol
## Topic 6: white, racial, american, like, among
## Topic 7: polic, white, crime, kill, american
## Topic 8: offic, polic, protest, street, show
## Topic 9: civil, right, protest, organ, law
## Topic 10: polic, protest, offic, floyd, citi
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 16 (approx. per word bound = -6.247, relative change = 1.306e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 17 (approx. per word bound = -6.246, relative change = 1.266e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 18 (approx. per word bound = -6.246, relative change = 1.186e-04)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 19 (approx. per word bound = -6.245, relative change = 9.877e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 20 (approx. per word bound = -6.245, relative change = 7.713e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, american, polit, don’, presid
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, remov, confeder, histori, symbol
## Topic 6: white, racial, american, like, among
## Topic 7: polic, white, crime, american, kill
## Topic 8: offic, polic, protest, show, street
## Topic 9: civil, right, organ, protest, justic
## Topic 10: polic, protest, offic, floyd, citi
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 21 (approx. per word bound = -6.244, relative change = 6.394e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 22 (approx. per word bound = -6.244, relative change = 5.512e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 23 (approx. per word bound = -6.244, relative change = 4.940e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 24 (approx. per word bound = -6.243, relative change = 4.592e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 25 (approx. per word bound = -6.243, relative change = 4.306e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, american, don’, polit, presid
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, remov, confeder, histori, civil
## Topic 6: white, racial, american, like, among
## Topic 7: polic, american, white, crime, kill
## Topic 8: offic, polic, protest, show, street
## Topic 9: civil, right, organ, protest, support
## Topic 10: polic, protest, offic, citi, floyd
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 26 (approx. per word bound = -6.243, relative change = 4.022e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 27 (approx. per word bound = -6.242, relative change = 3.666e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 28 (approx. per word bound = -6.242, relative change = 3.288e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 29 (approx. per word bound = -6.242, relative change = 2.961e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 30 (approx. per word bound = -6.242, relative change = 2.573e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, don’, american, polit, presid
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, remov, confeder, civil, histori
## Topic 6: white, racial, american, like, among
## Topic 7: american, white, polic, crime, kill
## Topic 8: offic, polic, protest, show, street
## Topic 9: organ, civil, right, protest, support
## Topic 10: polic, protest, offic, citi, floyd
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 31 (approx. per word bound = -6.242, relative change = 2.591e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 32 (approx. per word bound = -6.242, relative change = 2.508e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 33 (approx. per word bound = -6.241, relative change = 2.379e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 34 (approx. per word bound = -6.241, relative change = 2.249e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 35 (approx. per word bound = -6.241, relative change = 2.168e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: work, don’, american, polit, presid
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, civil, remov, confeder, histori
## Topic 6: white, racial, american, like, among
## Topic 7: white, american, crime, polic, kill
## Topic 8: offic, polic, protest, show, street
## Topic 9: organ, right, protest, civil, support
## Topic 10: polic, protest, offic, citi, floyd
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 36 (approx. per word bound = -6.241, relative change = 2.056e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 37 (approx. per word bound = -6.241, relative change = 1.984e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 38 (approx. per word bound = -6.241, relative change = 1.980e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 39 (approx. per word bound = -6.241, relative change = 1.956e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 40 (approx. per word bound = -6.241, relative change = 1.882e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: don’, work, american, polit, say
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, civil, remov, confeder, histori
## Topic 6: white, racial, american, like, among
## Topic 7: white, american, crime, polic, kill
## Topic 8: offic, polic, protest, show, post
## Topic 9: organ, protest, right, support, help
## Topic 10: polic, protest, citi, offic, floyd
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 41 (approx. per word bound = -6.240, relative change = 1.868e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 42 (approx. per word bound = -6.240, relative change = 1.723e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 43 (approx. per word bound = -6.240, relative change = 1.591e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 44 (approx. per word bound = -6.240, relative change = 1.438e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 45 (approx. per word bound = -6.240, relative change = 1.323e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: don’, say, american, work, polit
## Topic 4: polic, offic, feder, law, depart
## Topic 5: statu, civil, remov, confeder, histori
## Topic 6: white, racial, american, like, among
## Topic 7: white, american, crime, polic, kill
## Topic 8: offic, polic, protest, show, investig
## Topic 9: organ, protest, right, support, help
## Topic 10: polic, protest, citi, offic, floyd
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 46 (approx. per word bound = -6.240, relative change = 1.276e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 47 (approx. per word bound = -6.240, relative change = 1.389e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 48 (approx. per word bound = -6.240, relative change = 1.467e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 49 (approx. per word bound = -6.240, relative change = 1.463e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 50 (approx. per word bound = -6.240, relative change = 1.415e-05)
## Topic 1: loot, properti, riot, violenc, protest
## Topic 2: polic, citi, budget, depart, offic
## Topic 3: don’, say, american, polit, work
## Topic 4: polic, offic, feder, law, depart
## Topic 5: civil, statu, remov, confeder, histori
## Topic 6: white, racial, american, like, among
## Topic 7: white, american, crime, polic, kill
## Topic 8: offic, polic, protest, investig, show
## Topic 9: organ, protest, right, support, help
## Topic 10: polic, protest, citi, floyd, offic
## Topic 11: protest, demonstr, violenc, report, state
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 51 (approx. per word bound = -6.240, relative change = 1.283e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Completing Iteration 52 (approx. per word bound = -6.239, relative change = 1.110e-05)
## ..................................................................................................................
## Completed E-Step (0 seconds).
## Completed M-Step.
## Model Converged
plot(kResult)
labelTopics(PrevFitQuery, c(1:11)) # example of the top words in these topics
## Topic 1 Top Words:
## Highest Prob: loot, properti, riot, violenc, protest, polic, destruct
## FREX: loot, riot, properti, destruct, looter, store, destroy
## Lift: weekend, looter, loot, destruct, destroy, properti, rioter
## Score: weekend, loot, looter, destruct, rioter, properti, riot
## Topic 2 Top Words:
## Highest Prob: polic, citi, budget, depart, offic, law, enforc
## FREX: budget, spend, employe, largest, citi, billion, defund
## Lift: spend, budget, billion, largest, defund, employe, council
## Score: spend, budget, largest, billion, fund, employe, citi
## Topic 3 Top Words:
## Highest Prob: presid, say, call, white, trump, left, post
## FREX: biden, left, got, town, post, analysi, talk
## Lift: analysi, biden, mention, low, town, pictur, got
## Score: analysi, biden, trump, got, town, arriv, tweet
## Topic 4 Top Words:
## Highest Prob: polic, offic, feder, law, enforc, depart, forc
## FREX: equip, agenc, militari, feder, weapon, militar, investig
## Lift: shield, equip, misconduct, militar, complaint, militari, agenc
## Score: shield, equip, portland, feder, depart, offic, investig
## Topic 5 Top Words:
## Highest Prob: white, racial, american, like, among, democrat, race
## FREX: like, equal, white, age, among, republican, racial
## Lift: age, achiev, republican, reaction, survey, equal, gap
## Score: age, white, equal, compar, gap, achiev, republican
## Topic 6 Top Words:
## Highest Prob: statu, american, remov, civil, histori, confeder, union
## FREX: statu, remov, confeder, symbol, histori, african, union
## Lift: statu, confeder, remov, symbol, robert, southern, figur
## Score: statu, confeder, remov, union, segreg, symbol, slaveri
## Topic 7 Top Words:
## Highest Prob: polic, american, crime, violenc, kill, communiti, white
## FREX: system, men, studi, job, rate, crime, phrase
## Lift: troubl, phrase, african-american, rate, poverti, statist, margin
## Score: troubl, phrase, african-american, poverti, percent, neighborhood, cop
## Topic 8 Top Words:
## Highest Prob: polic, protest, offic, citi, floyd, minneapoli, georg
## FREX: night, chauvin, fire, saturday, minneapoli, downtown, curfew
## Lift: texa, chauvin, derek, saturday, precinct, downtown, friday
## Score: texa, curfew, saturday, night, monday, chauvin, brooklyn
## Topic 9 Top Words:
## Highest Prob: protest, right, civil, law, public, peac, bill
## FREX: bill, king, mask, civil, amend, spread, wear
## Lift: bill, amend, distanc, king, mask, resist, strategi
## Score: bill, mask, king, peac, moral, luther, civil
## Topic 10 Top Words:
## Highest Prob: protest, demonstr, violenc, report, violent, state, author
## FREX: demonstr, author, juli, violent, actor, report, event
## Lift: actor, conflict, oregon, juli, summer, trend, locat
## Score: actor, demonstr, portland, violent, trump, peac, oregon
## Topic 11 Top Words:
## Highest Prob: organ, communiti, help, support, work, justic, includ
## FREX: organ, provid, foundat, program, inform, resourc, help
## Lift: reli, foundat, platform, contact, provid, ensur, inform
## Score: reli, fund, foundat, program, contact, educ, organ
prep <- estimateEffect(1: 11 ~ Support_Oppose,
PrevFitQuery, meta = out$meta,
uncertainty = "Global")
summary(prep, topics = 1:2)
##
## Call:
## estimateEffect(formula = 1:11 ~ Support_Oppose, stmobj = PrevFitQuery,
## metadata = out$meta, uncertainty = "Global")
##
##
## Topic 1:
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.14534 0.01812 8.020 5.67e-14 ***
## Support_OpposeSupport -0.12992 0.02565 -5.065 8.49e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Topic 2:
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.06231 0.01557 4.003 8.5e-05 ***
## Support_OpposeSupport -0.01508 0.02298 -0.656 0.512
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(prep, covariate = "Support_Oppose", topics = c(1: 11),
model = PrevFitQuery, method = "difference",
cov.value1 = "Support", cov.value2 = "Oppose",
main = "Effect of Support vs. Oppose",
xlim = c(-0.5, 0.5), labeltype = "custom", cex = .5,
custom.labels = c("1-Looting/Riot", "2-Police funding/budget", "3-Politics", "4-Police use of force", "5-Race equality", "6-Statue removal", "7-Crimes poverty", "8-Floyd/Minneapolis",
"9-Civil rights", "10-Protests/Demonstrati", "11-Donation/support"))
This shows that topics Race equality, Civil rights, Donation/support are significantly more prevalent in Support results, and topics Looting/Riot is more prevalent in Oppose results.
topicCorr() permits correlations between topics, or how closely related topics are to one another (i.e., how likely they are to appear in the same document). This function requires igraph R package.
mod.out.corr <- topicCorr(PrevFitQuery)
plot(mod.out.corr)
In topic modeling, words are represented as frequencies across documents – each word has a vector of numeric values. Newer techniques such as word2vec and GloVe use neural net approaches to construct word vectors. Example: Paris - France + Germany = ??? .
Here, word vectors are created based on co-occurrences. See more: https://code.google.com/archive/p/word2vec/
First, we build a corpus with specific cleaning tasks as input to train a word2vec model.
library(word2vec)
## text cleaning specific for input to word2vec, which include conversion text oto ASCII, keep alphanumeric characters, removing leading/trailing spaces
x <- txt_clean_word2vec(corpus, ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)
Then, we train the word embeddings model with a set of parameters. With the model, we can begin to get either 1) the embedding of words, or 2) the nearest words which are similar to either a word or a word vector.
set.seed(23874)
model_cbow <- word2vec(x, dim=400, iter=20) #continuous bag of words algorithm
#check similarity
nn_cbow <- predict(model_cbow, "protest", type = "nearest", top_n = 10)
nn_cbow
## $protest
## term1 term2 similarity rank
## 1 protest demonstration 0.7346968 1
## 2 protest vigil 0.6791759 2
## 3 protest assembly 0.6705053 3
## 4 protest rally 0.6678751 4
## 5 protest protests 0.6588770 5
## 6 protest demonstrations 0.6206497 6
## 7 protest marched 0.5931678 7
## 8 protest tense 0.5914532 8
## 9 protest protesters 0.5859006 9
## 10 protest protestors 0.5797508 10
We can also do some calculations with the vectors and find similar terms
emb <- as.matrix(model_cbow)
vectors <- emb[c("equality", "rights"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model_cbow, vectors, type = "nearest", top_n = 10)
## $equality
## term similarity rank
## 1 achieve 0.7702916 1
## 2 achieving 0.7347823 2
## 3 blacks 0.6887189 3
## 4 slur 0.6571456 4
## 5 inequalities 0.6321969 5
## 6 inequality 0.6284207 6
## 7 achieved 0.6210301 7
## 8 injustice 0.6182012 8
## 9 divide 0.6167789 9
## 10 resentment 0.6037710 10
##
## $rights
## term similarity rank
## 1 disobedience 0.7972412 1
## 2 liberties 0.7582856 2
## 3 litigation 0.7581983 3
## 4 beings 0.7228526 4
## 5 war 0.6755592 5
## 6 disorders 0.6654155 6
## 7 servants 0.6421698 7
## 8 precipitated 0.6366475 8
## 9 disorder 0.6294204 9
## 10 liability 0.6121472 10
##
## $avg
## term similarity rank
## 1 rights 0.7549644 1
## 2 equality 0.7549641 2
## 3 achieve 0.6358227 3
## 4 beings 0.5996270 4
## 5 disobedience 0.5985212 5
## 6 achieving 0.5980842 6
## 7 litigation 0.5778762 7
## 8 blacks 0.5664884 8
## 9 liberties 0.5587466 9
## 10 consumption 0.5477144 10
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., … & Schmid-Petri, H. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. Carina Jacobi, Wouter van Atteveldt & Kasper Welbers (2015): Quantitative analysis of large amounts of journalistic texts using topic modelling, Digital Journalism, DOI: 10.1080/21670811.2015.1093271 Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235.link Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781.link Arun, R., Suresh, V., Madhavan, C. V., & Murthy, M. N. (2010, June). On finding the natural number of topics with latent dirichlet allocation: Some observations. In Pacific-Asia conference on knowledge discovery and data mining (pp. 391-402). Springer, Berlin, Heidelberg.link Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61-84.link
Roberts, M.E., Stewart, B.M. Tingley, D. & Benoit, K. (2017) stm: Estimation of the Structural Topic Model. (https://cran.r-project.org/web/packages/stm/index.html)
Pretrained GLoVe: https://nlp.stanford.edu/projects/glove/