Application of LDA topic modeling

We are going to go through several steps to do topic modeling, including:

  1. Preprocessing data
  2. Construct document-term matrix
  3. Set parameters
  4. Run topic modeling function
library(topicmodels) #topic modeling
library(quanteda) #preprocessing and building dtm/dfm
library(dplyr)
library(ldatuning) #find topic numbers
library(tidytext)
library(ggplot2)
library(seededlda)

We will use a dataset about online media coverage about the 2020 BLM protests with 2 opposing stances (support vs opposition). First, let’s look at our data:

df <- read.csv("/Users/chautong/Desktop/My Research/R/Teaching_Workshop_Cornell/BLM_stm.csv")
colnames(df)
## [1] "X"              "Num_Queries"    "query"          "Link"          
## [5] "Type"           "Support_Oppose" "text"

Preprocessing

  • The documents should be divided into word unites, called tokens. (=tokenization)
  • After tokenization, convert all capital letters to lower case for the purpose of term unification. (e.g. princess and Princess can be considered as one term)
  • Punctuations, special characters, and numbers (if necessary) should be removed. But sometimes we may want to retain special characters (e.g. hashtags) in certain contexts like when modeling a Twitter corpus.
  • Stop-words, which are usually function words such as prepositions or articles (a, the), should be removed. We remove stopwords because we want to remove too frequent words that likely appear in every documents and doesn’t add much to building a thematic structure. Lowercasing and removal of punctuations should be done before removing the stopwords.
  • Lastly, we do stemming, which is a process to reduce each word to its stem by stripping its suffixes. (e.g. contaminating and contamination become contamin)
df$text <- as.character(df$text) #change to character
corpus <- corpus(df) #build "corpus"
tokens <- tokens(corpus,remove_numbers=T,remove_punct=T, remove_url=T, remove_symbols=T, include_docvars=T) 
tokens <- tokens_tolower(tokens) 
tokens <- tokens_wordstem(tokens)
stopwords(language = "en")
##   [1] "i"          "me"         "my"         "myself"     "we"        
##   [6] "our"        "ours"       "ourselves"  "you"        "your"      
##  [11] "yours"      "yourself"   "yourselves" "he"         "him"       
##  [16] "his"        "himself"    "she"        "her"        "hers"      
##  [21] "herself"    "it"         "its"        "itself"     "they"      
##  [26] "them"       "their"      "theirs"     "themselves" "what"      
##  [31] "which"      "who"        "whom"       "this"       "that"      
##  [36] "these"      "those"      "am"         "is"         "are"       
##  [41] "was"        "were"       "be"         "been"       "being"     
##  [46] "have"       "has"        "had"        "having"     "do"        
##  [51] "does"       "did"        "doing"      "would"      "should"    
##  [56] "could"      "ought"      "i'm"        "you're"     "he's"      
##  [61] "she's"      "it's"       "we're"      "they're"    "i've"      
##  [66] "you've"     "we've"      "they've"    "i'd"        "you'd"     
##  [71] "he'd"       "she'd"      "we'd"       "they'd"     "i'll"      
##  [76] "you'll"     "he'll"      "she'll"     "we'll"      "they'll"   
##  [81] "isn't"      "aren't"     "wasn't"     "weren't"    "hasn't"    
##  [86] "haven't"    "hadn't"     "doesn't"    "don't"      "didn't"    
##  [91] "won't"      "wouldn't"   "shan't"     "shouldn't"  "can't"     
##  [96] "cannot"     "couldn't"   "mustn't"    "let's"      "that's"    
## [101] "who's"      "what's"     "here's"     "there's"    "when's"    
## [106] "where's"    "why's"      "how's"      "a"          "an"        
## [111] "the"        "and"        "but"        "if"         "or"        
## [116] "because"    "as"         "until"      "while"      "of"        
## [121] "at"         "by"         "for"        "with"       "about"     
## [126] "against"    "between"    "into"       "through"    "during"    
## [131] "before"     "after"      "above"      "below"      "to"        
## [136] "from"       "up"         "down"       "in"         "out"       
## [141] "on"         "off"        "over"       "under"      "again"     
## [146] "further"    "then"       "once"       "here"       "there"     
## [151] "when"       "where"      "why"        "how"        "all"       
## [156] "any"        "both"       "each"       "few"        "more"      
## [161] "most"       "other"      "some"       "such"       "no"        
## [166] "nor"        "not"        "only"       "own"        "same"      
## [171] "so"         "than"       "too"        "very"       "will"
tokens <- tokens_remove(tokens, stopwords("english"))

Build a quanteda dfm (document-feature matrix)

dfm <- dfm(tokens)
dfm #sparse
## Document-feature matrix of: 228 documents, 12,981 features (96.48% sparse) and 6 docvars.
##        features
## docs    view app nikki carvaj cnn vice presid mike penc declin
##   text1    1   1     1      1   2    2      4    1    6      1
##   text2    2   0     0      0   0    4     18    0    0      4
##   text3    0   0     0      0   0    0      0    0    0      0
##   text4    1   0     0      0   0    0      2    0    0      0
##   text5    1   0     0      0   0    3      3    3    7      0
##   text6    0   0     0      0   0    1      1    1    1      0
## [ reached max_ndoc ... 222 more documents, reached max_nfeat ... 12,971 more features ]
dfm_trim <- dfm_trim(dfm, min_termfreq = 5, max_docfreq = 225) #removing terms that occurred less than 5 times and occurred over 225 documents. This is to remove too infrequent/frquent words. You can adjust by using proportions.

Set parameters & Run LDA topic modeling!

  • Model selection is the process of determining a model’s parameters (i.e. the number of topics, K, and the prior parameters, alpha and beta)
  • There are two parameters required to be set: the alpha and beta. The alpha controls the mixture of topics for any given document. Lower values indicate documents having less of a mixture of topics and higher values indicate documents having more of a mixture of topics.
  • The beta controls the distribution of words per topic. Again, lower values indicate topics having less word and higher values indicate topics having more words.
  • Ideally we want our documents to have not too many topics with not too many words. Some studies experiment with different parameters (Maier et al., 2018). Research finds alpha is more important for determining the quality of topics, so when experimenting researchers vary the alpha while fixing beta to a default value.

Let’s say we want to run topic modeling with 10 topics, using Gibbs sampling. (more on Gibbs sampling: https://medium.com/@tomar.ankur287/topic-modeling-using-lda-and-gibbs-sampling-explained-49d49b3d1045)

k <- 11

control_LDA_Gibbs <- list(alpha = 50/k, estimate.beta = TRUE, #the starting value for alpha is 50/k suggested by Griffiths & Steyvers (2004)
                          verbose = 0, prefix = tempfile(),
                          save = 0, keep = 0,  #no information is printed during the algorithm; no immediate results are saved
                          seed = 999, #random seed for reproducibility
                          nstart = 1, #number of repeated runs with random initializations
                          best = TRUE, #returns only the best one model
                          delta = 0.1, #specifies the parameter of the prior distribution of the term distribution over topics. The default is 0.1
                          iter = 2000, #iterations 
                          burnin = 100, #the first 100 iterations are discarded 
                          thin = 2000) #then every 2000th iteration is returned
lda <- LDA(dfm_trim, k=k, method="Gibbs", control = control_LDA_Gibbs)
lda
## A LDA_Gibbs topic model with 11 topics.

We can visualize the terms that are most common within each topic. beta numbers are assigned to each word in a topic. If a beta score is higher, that word matters more to that topic. In other words, when a message uses that word, it is more likely to be categorized into the affiliated cluster.

topics<- tidy(lda, matrix="beta") #extract beta with tidy() function
top_terms <- topics %>% group_by(topic) %>% #group the words by topic
  top_n(10, beta) %>%   # identify the top 10 words
  ungroup() %>%  #remove the grouping variable
  arrange(topic, -beta)

top_terms %>% mutate(term = reorder(term, beta)) %>%
  ggplot(aes(term, beta, fill = factor(topic))) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

We can also see the topic distributions (gamma values) for each document. Each document has a gamma score for each topic. Remember each document is considered to be a mixture of topics (mixed-membership model). For example, we can say article 1 is 14% of topic 4, 5% of topic 5, 24% of topic 6, .. and so on. This suggests that a document’s content is predominantly in one topic as opposed to another.

topics_doc <- tidy(lda, matrix="gamma")
topics_doc$document <- gsub("text", "", topics_doc$document)
topics_doc$document <- as.numeric(topics_doc$document)
topics_doc_sp <-  tidyr::spread(topics_doc, topic,gamma)
topics_doc_sp 
## # A tibble: 228 × 12
##    document     `1`    `2`    `3`    `4`    `5`    `6`     `7`   `8`    `9`
##       <dbl>   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>   <dbl> <dbl>  <dbl>
##  1        1 0.0258  0.0258 0.0360 0.149  0.0565 0.241  0.0496  0.285 0.0531
##  2        2 0.00971 0.0201 0.437  0.0705 0.0219 0.0184 0.00942 0.290 0.0867
##  3        3 0.0232  0.0335 0.0198 0.0558 0.0284 0.0610 0.0198  0.677 0.0438
##  4        4 0.0162  0.0673 0.365  0.0826 0.0485 0.0298 0.0315  0.225 0.0690
##  5        5 0.0292  0.0206 0.0572 0.105  0.0507 0.305  0.0421  0.243 0.0831
##  6        6 0.0210  0.0161 0.0210 0.0900 0.0137 0.418  0.0334  0.302 0.0432
##  7        7 0.0214  0.0162 0.0715 0.0704 0.0298 0.202  0.0319  0.419 0.0319
##  8        8 0.0398  0.0398 0.0260 0.174  0.0777 0.250  0.0743  0.133 0.0743
##  9        9 0.0419  0.112  0.0419 0.103  0.0770 0.121  0.173   0.130 0.0594
## 10       10 0.0255  0.0543 0.116  0.0897 0.0255 0.172  0.0322  0.362 0.0410
## # … with 218 more rows, and 2 more variables: 10 <dbl>, 11 <dbl>

Deciding K

Choosing the number of topics: Here in the example I randomly chose 10, but we could run this function first to figure out several options to explore.

Package ldatuning uses 4 metrics (from Griffiths 2004, Cao Juan et al 2009, Arun et al 2010, and Deveaud et al 2014) to select the number of topics for LDA topic modeling.

#result <- FindTopicsNumber(
#  dfm_trim,
#  topics = seq(from = 5, to = 20, by = 1),
#  metrics = c("Griffiths2004", "CaoJuan2009", "Arun2010", "Deveaud2014"),
#  method = "Gibbs",
#  control = list(seed = 77),
#  mc.cores = 2L,
#  verbose = TRUE
#) 

What if you already have a set of predefined topics?

The seededlda package can be used if you want to predefine topics in LDA using a dictionary of "seed words. Here, I’m using a liwc dictionary of words that indicate moral values.

download.file("https://moralfoundations.org/wp-content/uploads/files/downloads/moral%20foundations%20dictionary.dic", tf <- tempfile())
dictliwc <- dictionary(file = tf, format = "LIWC")
print(dictliwc)
## Dictionary object with 11 key entries.
## - [HarmVirtue]:
##   - amity, benefit*, care, caring, compassion*, defen*, empath*, guard*, peace*, preserve, protect*, safe*, secur*, shelter, shield, sympath*
## - [HarmVice]:
##   - abandon*, abuse*, annihilate*, attack*, brutal*, cruel*, crush*, damag*, destroy, detriment*, endanger*, exploit, exploited, exploiting, exploits, fight*, harm*, hurt*, impair, kill [ ... and 15 more ]
## - [FairnessVirtue]:
##   - balance*, constant, egalitar*, equable, equal*, equity, equivalent, evenness, fair, fair-*, fairly, fairmind*, fairness, fairplay, homologous, honest*, impartial*, justice, justifi*, justness [ ... and 6 more ]
## - [FairnessVice]:
##   - bias*, bigot*, discriminat*, dishonest, disproportion*, dissociate, exclud*, exclusion, favoritism, inequitable, injust*, preference, prejud*, segregat*, unequal*, unfair*, unjust*, unscrupulous
## - [IngroupVirtue]:
##   - ally, cadre, cliqu*, cohort, collectiv*, communal, commune*, communis*, communit*, comrad*, devot*, familial, families, family, fellow*, group, guild, homeland*, insider, joint [ ... and 9 more ]
## - [IngroupVice]:
##   - abandon*, apostasy, apostate, betray*, deceiv*, deserted, deserter*, deserting, disloyal*, enem*, foreign*, immigra*, imposter, individual*, jilt*, miscreant, renegade, sequester, spy, terroris* [ ... and 3 more ]
## [ reached max_nkey ... 5 more keys ]

We also need to specify k, the number of topics is already determined by the number of keys in the dictionary. Now, we can fit the seeded LDA model and match the features/terms in the dfm with the dictionary.

tmod_slda <- textmodel_seededlda(dfm_trim, dictionary = dictliwc)
terms(tmod_slda, 10)
##       HarmVirtue HarmVice  FairnessVirtue FairnessVice     IngroupVirtue
##  [1,] "protect"  "violenc" "equal"        "injustic"       "communiti"  
##  [2,] "guard"    "kill"    "justifi"      "disproportion"  "group"      
##  [3,] "secur"    "violent" "fair"         "bias"           "nation"     
##  [4,] "safeti"   "brutal"  "constant"     "unjust"         "member"     
##  [5,] "care"     "war"     "honest"       "unfair"         "nationwid"  
##  [6,] "safe"     "damag"   "black"        "exclud"         "homeland"   
##  [7,] "defend"   "attack"  "live"         "discriminatori" "fellow"     
##  [8,] "defens"   "harm"    "matter"       "prejudic"       "patriot"    
##  [9,] "benefit"  "destroy" "peopl"        "bigot"          "joint"      
## [10,] "shield"   "fight"   "polic"        "say"            "devot"      
##       IngroupVice AuthorityVirtue AuthorityVice PurityVirtue PurityVice
##  [1,] "terrorist" "law"           "protest"     "church"     "sick"    
##  [2,] "foreign"   "order"         "riot"        "clean"      "wanton"  
##  [3,] "enemi"     "leader"        "disobedi"    "pure"       "dirt"    
##  [4,] "abandon"   "control"       "rioter"      "decentr"    "sicken"  
##  [5,] "feder"     "legal"         "rebellion"   "decent"     "disgust" 
##  [6,] "use"       "respect"       "obstruct"    "offic"      "exploit" 
##  [7,] "said"      "class"         "dissent"     "polic"      "ruin"    
##  [8,] "polic"     "duti"          "lawless"     "said"       "statu"   
##  [9,] "offic"     "leadership"    "disrespect"  "video"      "may"     
## [10,] "right"     "mother"        "defianc"     "charg"      "getti"   
##       MoralityGeneral
##  [1,] "good"         
##  [2,] "wrong"        
##  [3,] "moral"        
##  [4,] "bad"          
##  [5,] "correct"      
##  [6,] "legal"        
##  [7,] "worth"        
##  [8,] "ideal"        
##  [9,] "offend"       
## [10,] "character"
# assign topics from seeded LDA as a document-level variable to the dfm
dfm_trim$topic2 <- topics(tmod_slda)

# cross-table of the topic frequency
table(dfm_trim$topic2, df$Support_Oppose)
##                  
##                   Oppose Support
##   HarmVirtue          30      12
##   HarmVice             8      11
##   FairnessVirtue      18       2
##   FairnessVice         0       9
##   IngroupVirtue        8      12
##   IngroupVice          3      13
##   AuthorityVirtue      9       8
##   AuthorityVice       29       9
##   PurityVirtue         3      12
##   PurityVice           8       1
##   MoralityGeneral      7      16

Structural topic modeling (as a variant of LDA)

We can use STM to determine if the distribution of topics differ as a function of the source, i.e., in this case, the stance (Support or Oppose). See more: https://www.structuraltopicmodel.com/

Some additional terminologies are:

  • Metadata: Information about each document
  • Topical prevalence: Topical prevalence covariates
  • Topical content: Topical content covariates
#install.packages("stm")
library(stm)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
## Building corpus... 
## Converting to Lower Case... 
## Removing punctuation... 
## Removing stopwords... 
## Remove Custom Stopwords...
## Removing numbers... 
## Stemming... 
## Creating Output...

Estimating the effect of Stance on topic prevalence:

Note that the top words now are different from the LDA output, as this time, we estimated the influence of metadata on topic distributions.

PrevFitQuery <- stm(documents = out$documents, vocab = out$vocab,
               K = 11, prevalence =~ Support_Oppose + query,
               max.em.its = 75, data = out$meta,
               init.type = "Spectral", seed = 100)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ...........
##   Recovering initialization...
##      ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.459) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.353, relative change = 1.636e-02) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.308, relative change = 7.078e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.286, relative change = 3.490e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.274, relative change = 1.906e-03) 
## Topic 1: loot, properti, riot, violenc, polic 
##  Topic 2: polic, citi, budget, crime, spend 
##  Topic 3: post, polic, polit, social, call 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, american, racial, like, support 
##  Topic 6: statu, remov, confeder, histori, american 
##  Topic 7: polic, racism, kill, white, system 
##  Topic 8: polic, protest, offic, floyd, citi 
##  Topic 9: protest, civil, right, law, bill 
##  Topic 10: protest, demonstr, violenc, feder, report 
##  Topic 11: polic, law, offic, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.267, relative change = 1.119e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.263, relative change = 6.967e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.260, relative change = 4.470e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.258, relative change = 2.985e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.257, relative change = 2.117e-04) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, crime, spend 
##  Topic 3: post, call, social, famili, left 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, american, racial, like, among 
##  Topic 6: statu, remov, confeder, histori, american 
##  Topic 7: polic, kill, racism, system, white 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, offic, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.256, relative change = 1.551e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.255, relative change = 1.161e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.254, relative change = 9.030e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.254, relative change = 7.321e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.253, relative change = 6.124e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, crime, depart 
##  Topic 3: post, call, left, social, presid 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, confeder, histori, american 
##  Topic 7: polic, kill, american, crime, system 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, offic, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.253, relative change = 5.264e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.253, relative change = 4.605e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.253, relative change = 3.986e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.252, relative change = 3.425e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.252, relative change = 2.940e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: post, call, left, presid, social 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, confeder, american, histori 
##  Topic 7: polic, american, kill, crime, communiti 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, offic, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.252, relative change = 2.601e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.252, relative change = 2.333e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.252, relative change = 2.166e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.252, relative change = 2.006e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.251, relative change = 1.831e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: call, post, presid, left, social 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, american, confeder, histori 
##  Topic 7: polic, american, crime, communiti, kill 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, offic, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -6.251, relative change = 1.715e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -6.251, relative change = 1.644e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -6.251, relative change = 1.594e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -6.251, relative change = 1.555e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -6.251, relative change = 1.524e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: call, post, presid, left, white 
##  Topic 4: polic, offic, protest, feder, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, american, confeder, histori 
##  Topic 7: polic, american, crime, communiti, violenc 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, organ, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -6.251, relative change = 1.566e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -6.251, relative change = 1.568e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -6.251, relative change = 1.594e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -6.251, relative change = 1.612e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -6.250, relative change = 1.649e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: call, presid, post, left, white 
##  Topic 4: polic, offic, feder, protest, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, american, confeder, histori 
##  Topic 7: polic, american, crime, communiti, violenc 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, law, organ, communiti, enforc 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -6.250, relative change = 1.670e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 37 (approx. per word bound = -6.250, relative change = 1.680e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 38 (approx. per word bound = -6.250, relative change = 1.687e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 39 (approx. per word bound = -6.250, relative change = 1.590e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 40 (approx. per word bound = -6.250, relative change = 1.706e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, post, white, left 
##  Topic 4: polic, offic, feder, protest, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, remov, american, confeder, histori 
##  Topic 7: polic, american, crime, communiti, violenc 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, organ, communiti, law, includ 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 41 (approx. per word bound = -6.250, relative change = 1.737e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 42 (approx. per word bound = -6.250, relative change = 1.681e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 43 (approx. per word bound = -6.250, relative change = 1.647e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 44 (approx. per word bound = -6.250, relative change = 1.601e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 45 (approx. per word bound = -6.249, relative change = 1.536e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, post, left 
##  Topic 4: polic, offic, feder, protest, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, confeder, histori 
##  Topic 7: polic, american, crime, communiti, violenc 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: polic, organ, communiti, law, includ 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 46 (approx. per word bound = -6.249, relative change = 1.487e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 47 (approx. per word bound = -6.249, relative change = 1.453e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 48 (approx. per word bound = -6.249, relative change = 1.432e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 49 (approx. per word bound = -6.249, relative change = 1.365e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 50 (approx. per word bound = -6.249, relative change = 1.335e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, post, left 
##  Topic 4: polic, offic, feder, protest, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, histori, confeder 
##  Topic 7: polic, american, crime, violenc, communiti 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, state 
##  Topic 11: organ, polic, communiti, includ, help 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 51 (approx. per word bound = -6.249, relative change = 1.409e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 52 (approx. per word bound = -6.249, relative change = 1.217e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 53 (approx. per word bound = -6.249, relative change = 1.370e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 54 (approx. per word bound = -6.249, relative change = 1.407e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 55 (approx. per word bound = -6.249, relative change = 1.349e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, post, left 
##  Topic 4: polic, offic, feder, depart, law 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, histori, confeder 
##  Topic 7: polic, american, crime, violenc, communiti 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, violent 
##  Topic 11: organ, communiti, polic, help, includ 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 56 (approx. per word bound = -6.249, relative change = 1.351e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 57 (approx. per word bound = -6.248, relative change = 1.368e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 58 (approx. per word bound = -6.248, relative change = 1.357e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 59 (approx. per word bound = -6.248, relative change = 1.414e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 60 (approx. per word bound = -6.248, relative change = 1.374e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, post, left 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, civil, histori 
##  Topic 7: polic, american, crime, violenc, communiti 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, violent 
##  Topic 11: organ, communiti, help, support, polic 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 61 (approx. per word bound = -6.248, relative change = 1.399e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 62 (approx. per word bound = -6.248, relative change = 1.286e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 63 (approx. per word bound = -6.248, relative change = 1.259e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 64 (approx. per word bound = -6.248, relative change = 1.192e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 65 (approx. per word bound = -6.248, relative change = 1.153e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, say, trump 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, civil, histori 
##  Topic 7: polic, american, crime, violenc, kill 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, civil, right, law, public 
##  Topic 10: protest, demonstr, violenc, report, violent 
##  Topic 11: organ, communiti, help, support, work 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 66 (approx. per word bound = -6.248, relative change = 1.107e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 67 (approx. per word bound = -6.248, relative change = 1.161e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 68 (approx. per word bound = -6.248, relative change = 1.136e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 69 (approx. per word bound = -6.247, relative change = 1.133e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 70 (approx. per word bound = -6.247, relative change = 1.158e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: presid, call, white, say, trump 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: white, racial, american, like, among 
##  Topic 6: statu, american, remov, civil, histori 
##  Topic 7: polic, american, crime, violenc, kill 
##  Topic 8: polic, protest, offic, citi, floyd 
##  Topic 9: protest, right, civil, law, public 
##  Topic 10: protest, demonstr, violenc, report, violent 
##  Topic 11: organ, communiti, help, support, work 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 71 (approx. per word bound = -6.247, relative change = 1.135e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 72 (approx. per word bound = -6.247, relative change = 1.084e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 73 (approx. per word bound = -6.247, relative change = 1.051e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Model Converged
plot(PrevFitQuery, type = "summary", xlim = c(0, .4)) #plotting the summary model

Check topic quality with topicQuality() function

Each STM has semantic coherence and exclusivity values associated with each topic. The topicQuality() function plots the values and labels each with topic numbers.

topicQuality(model = PrevFitQuery, documents = docs)
##  [1] -22.28524 -43.71394 -26.06731 -28.15585 -36.24280 -43.86575 -26.18637
##  [8] -19.85138 -25.60266 -21.39547 -30.20609
##  [1] 9.074018 9.852134 8.883609 9.383724 9.868091 9.609326 8.635021 9.219973
##  [9] 9.007736 9.589415 9.231538

searchK() function

You can also use a data-driven approach to automatically find the number of topics.

kResult <- searchK(out$documents, out$vocab, K=c(7,11), prevalence=~Support_Oppose + query,
                   data=meta)
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      .......
##   Recovering initialization...
##      ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.530) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.417, relative change = 1.728e-02) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.382, relative change = 5.482e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.365, relative change = 2.665e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.355, relative change = 1.583e-03) 
## Topic 1: polic, protest, loot, citi, properti 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, system, american, polit 
##  Topic 4: polic, offic, protest, demonstr, street 
##  Topic 5: statu, civil, imag, remov, confeder 
##  Topic 6: white, american, racial, movement, support 
##  Topic 7: protest, demonstr, violenc, state, report 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.348, relative change = 1.067e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.343, relative change = 7.856e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.339, relative change = 6.031e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.336, relative change = 4.705e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.334, relative change = 3.681e-04) 
## Topic 1: loot, polic, protest, riot, properti 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, american, system, work 
##  Topic 4: polic, offic, protest, floyd, demonstr 
##  Topic 5: civil, statu, right, remov, movement 
##  Topic 6: white, american, racial, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.332, relative change = 2.926e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.330, relative change = 2.402e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.329, relative change = 2.190e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.328, relative change = 2.108e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.326, relative change = 2.002e-04) 
## Topic 1: loot, riot, properti, protest, polic 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, organ, american, work 
##  Topic 4: polic, offic, protest, floyd, citi 
##  Topic 5: civil, statu, right, american, movement 
##  Topic 6: white, american, racial, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.325, relative change = 1.756e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.324, relative change = 1.442e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.324, relative change = 1.164e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.323, relative change = 9.607e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.323, relative change = 8.169e-05) 
## Topic 1: loot, riot, properti, protest, polic 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, organ, work, social 
##  Topic 4: polic, offic, protest, floyd, citi 
##  Topic 5: civil, right, statu, american, movement 
##  Topic 6: white, racial, american, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.322, relative change = 7.077e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.322, relative change = 6.219e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.321, relative change = 5.537e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.321, relative change = 5.001e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.321, relative change = 4.489e-05) 
## Topic 1: loot, riot, properti, protest, violenc 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, organ, work, social 
##  Topic 4: polic, offic, protest, citi, floyd 
##  Topic 5: civil, right, statu, american, law 
##  Topic 6: white, racial, american, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, report 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -6.320, relative change = 3.873e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -6.320, relative change = 3.285e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -6.320, relative change = 2.845e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -6.320, relative change = 2.480e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -6.320, relative change = 2.183e-05) 
## Topic 1: loot, riot, properti, protest, violenc 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, organ, work, social 
##  Topic 4: polic, offic, protest, citi, floyd 
##  Topic 5: civil, right, american, statu, law 
##  Topic 6: white, racial, american, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, report 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -6.320, relative change = 1.868e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -6.320, relative change = 1.620e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -6.320, relative change = 1.397e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -6.319, relative change = 1.245e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -6.319, relative change = 1.074e-05) 
## Topic 1: loot, riot, properti, protest, violenc 
##  Topic 2: polic, citi, offic, law, budget 
##  Topic 3: polic, communiti, organ, work, social 
##  Topic 4: polic, offic, protest, citi, floyd 
##  Topic 5: civil, right, american, statu, law 
##  Topic 6: white, racial, american, like, democrat 
##  Topic 7: protest, demonstr, feder, violenc, report 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Model Converged 
## Beginning Spectral Initialization 
##   Calculating the gram matrix...
##   Finding anchor words...
##      ...........
##   Recovering initialization...
##      ...........
## Initialization complete.
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 1 (approx. per word bound = -6.467) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 2 (approx. per word bound = -6.352, relative change = 1.780e-02) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 3 (approx. per word bound = -6.306, relative change = 7.200e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 4 (approx. per word bound = -6.285, relative change = 3.337e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 5 (approx. per word bound = -6.274, relative change = 1.804e-03) 
## Topic 1: loot, properti, riot, violenc, polic 
##  Topic 2: polic, citi, budget, offic, depart 
##  Topic 3: polic, polit, american, work, social 
##  Topic 4: polic, offic, law, feder, depart 
##  Topic 5: statu, remov, confeder, histori, civil 
##  Topic 6: white, american, racial, like, race 
##  Topic 7: polic, racism, white, kill, crime 
##  Topic 8: offic, polic, protest, street, crowd 
##  Topic 9: right, civil, protest, law, polic 
##  Topic 10: polic, protest, offic, floyd, citi 
##  Topic 11: protest, demonstr, feder, violenc, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 6 (approx. per word bound = -6.267, relative change = 1.124e-03) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 7 (approx. per word bound = -6.262, relative change = 7.662e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 8 (approx. per word bound = -6.259, relative change = 5.535e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 9 (approx. per word bound = -6.256, relative change = 4.206e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 10 (approx. per word bound = -6.254, relative change = 3.330e-04) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, american, polit, polic, call 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, remov, confeder, histori, symbol 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: polic, white, kill, crime, racism 
##  Topic 8: offic, polic, protest, street, show 
##  Topic 9: right, civil, protest, law, organ 
##  Topic 10: polic, protest, offic, floyd, citi 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 11 (approx. per word bound = -6.252, relative change = 2.676e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 12 (approx. per word bound = -6.251, relative change = 2.196e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 13 (approx. per word bound = -6.250, relative change = 1.837e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 14 (approx. per word bound = -6.249, relative change = 1.581e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 15 (approx. per word bound = -6.248, relative change = 1.406e-04) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, american, polit, presid, don’ 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, remov, confeder, histori, symbol 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: polic, white, crime, kill, american 
##  Topic 8: offic, polic, protest, street, show 
##  Topic 9: civil, right, protest, organ, law 
##  Topic 10: polic, protest, offic, floyd, citi 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 16 (approx. per word bound = -6.247, relative change = 1.306e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 17 (approx. per word bound = -6.246, relative change = 1.266e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 18 (approx. per word bound = -6.246, relative change = 1.186e-04) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 19 (approx. per word bound = -6.245, relative change = 9.877e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 20 (approx. per word bound = -6.245, relative change = 7.713e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, american, polit, don’, presid 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, remov, confeder, histori, symbol 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: polic, white, crime, american, kill 
##  Topic 8: offic, polic, protest, show, street 
##  Topic 9: civil, right, organ, protest, justic 
##  Topic 10: polic, protest, offic, floyd, citi 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 21 (approx. per word bound = -6.244, relative change = 6.394e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 22 (approx. per word bound = -6.244, relative change = 5.512e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 23 (approx. per word bound = -6.244, relative change = 4.940e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 24 (approx. per word bound = -6.243, relative change = 4.592e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 25 (approx. per word bound = -6.243, relative change = 4.306e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, american, don’, polit, presid 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, remov, confeder, histori, civil 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: polic, american, white, crime, kill 
##  Topic 8: offic, polic, protest, show, street 
##  Topic 9: civil, right, organ, protest, support 
##  Topic 10: polic, protest, offic, citi, floyd 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 26 (approx. per word bound = -6.243, relative change = 4.022e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 27 (approx. per word bound = -6.242, relative change = 3.666e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 28 (approx. per word bound = -6.242, relative change = 3.288e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 29 (approx. per word bound = -6.242, relative change = 2.961e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 30 (approx. per word bound = -6.242, relative change = 2.573e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, don’, american, polit, presid 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, remov, confeder, civil, histori 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: american, white, polic, crime, kill 
##  Topic 8: offic, polic, protest, show, street 
##  Topic 9: organ, civil, right, protest, support 
##  Topic 10: polic, protest, offic, citi, floyd 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 31 (approx. per word bound = -6.242, relative change = 2.591e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 32 (approx. per word bound = -6.242, relative change = 2.508e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 33 (approx. per word bound = -6.241, relative change = 2.379e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 34 (approx. per word bound = -6.241, relative change = 2.249e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 35 (approx. per word bound = -6.241, relative change = 2.168e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: work, don’, american, polit, presid 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, civil, remov, confeder, histori 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: white, american, crime, polic, kill 
##  Topic 8: offic, polic, protest, show, street 
##  Topic 9: organ, right, protest, civil, support 
##  Topic 10: polic, protest, offic, citi, floyd 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 36 (approx. per word bound = -6.241, relative change = 2.056e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 37 (approx. per word bound = -6.241, relative change = 1.984e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 38 (approx. per word bound = -6.241, relative change = 1.980e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 39 (approx. per word bound = -6.241, relative change = 1.956e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 40 (approx. per word bound = -6.241, relative change = 1.882e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: don’, work, american, polit, say 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, civil, remov, confeder, histori 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: white, american, crime, polic, kill 
##  Topic 8: offic, polic, protest, show, post 
##  Topic 9: organ, protest, right, support, help 
##  Topic 10: polic, protest, citi, offic, floyd 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 41 (approx. per word bound = -6.240, relative change = 1.868e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 42 (approx. per word bound = -6.240, relative change = 1.723e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 43 (approx. per word bound = -6.240, relative change = 1.591e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 44 (approx. per word bound = -6.240, relative change = 1.438e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 45 (approx. per word bound = -6.240, relative change = 1.323e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: don’, say, american, work, polit 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: statu, civil, remov, confeder, histori 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: white, american, crime, polic, kill 
##  Topic 8: offic, polic, protest, show, investig 
##  Topic 9: organ, protest, right, support, help 
##  Topic 10: polic, protest, citi, offic, floyd 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 46 (approx. per word bound = -6.240, relative change = 1.276e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 47 (approx. per word bound = -6.240, relative change = 1.389e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 48 (approx. per word bound = -6.240, relative change = 1.467e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 49 (approx. per word bound = -6.240, relative change = 1.463e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 50 (approx. per word bound = -6.240, relative change = 1.415e-05) 
## Topic 1: loot, properti, riot, violenc, protest 
##  Topic 2: polic, citi, budget, depart, offic 
##  Topic 3: don’, say, american, polit, work 
##  Topic 4: polic, offic, feder, law, depart 
##  Topic 5: civil, statu, remov, confeder, histori 
##  Topic 6: white, racial, american, like, among 
##  Topic 7: white, american, crime, polic, kill 
##  Topic 8: offic, polic, protest, investig, show 
##  Topic 9: organ, protest, right, support, help 
##  Topic 10: polic, protest, citi, floyd, offic 
##  Topic 11: protest, demonstr, violenc, report, state 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 51 (approx. per word bound = -6.240, relative change = 1.283e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Completing Iteration 52 (approx. per word bound = -6.239, relative change = 1.110e-05) 
## ..................................................................................................................
## Completed E-Step (0 seconds). 
## Completed M-Step. 
## Model Converged
plot(kResult)

Note the differences in STM top-terms (with metadata as covariates)
labelTopics(PrevFitQuery, c(1:11))  # example of the top words in these topics
## Topic 1 Top Words:
##       Highest Prob: loot, properti, riot, violenc, protest, polic, destruct 
##       FREX: loot, riot, properti, destruct, looter, store, destroy 
##       Lift: weekend, looter, loot, destruct, destroy, properti, rioter 
##       Score: weekend, loot, looter, destruct, rioter, properti, riot 
## Topic 2 Top Words:
##       Highest Prob: polic, citi, budget, depart, offic, law, enforc 
##       FREX: budget, spend, employe, largest, citi, billion, defund 
##       Lift: spend, budget, billion, largest, defund, employe, council 
##       Score: spend, budget, largest, billion, fund, employe, citi 
## Topic 3 Top Words:
##       Highest Prob: presid, say, call, white, trump, left, post 
##       FREX: biden, left, got, town, post, analysi, talk 
##       Lift: analysi, biden, mention, low, town, pictur, got 
##       Score: analysi, biden, trump, got, town, arriv, tweet 
## Topic 4 Top Words:
##       Highest Prob: polic, offic, feder, law, enforc, depart, forc 
##       FREX: equip, agenc, militari, feder, weapon, militar, investig 
##       Lift: shield, equip, misconduct, militar, complaint, militari, agenc 
##       Score: shield, equip, portland, feder, depart, offic, investig 
## Topic 5 Top Words:
##       Highest Prob: white, racial, american, like, among, democrat, race 
##       FREX: like, equal, white, age, among, republican, racial 
##       Lift: age, achiev, republican, reaction, survey, equal, gap 
##       Score: age, white, equal, compar, gap, achiev, republican 
## Topic 6 Top Words:
##       Highest Prob: statu, american, remov, civil, histori, confeder, union 
##       FREX: statu, remov, confeder, symbol, histori, african, union 
##       Lift: statu, confeder, remov, symbol, robert, southern, figur 
##       Score: statu, confeder, remov, union, segreg, symbol, slaveri 
## Topic 7 Top Words:
##       Highest Prob: polic, american, crime, violenc, kill, communiti, white 
##       FREX: system, men, studi, job, rate, crime, phrase 
##       Lift: troubl, phrase, african-american, rate, poverti, statist, margin 
##       Score: troubl, phrase, african-american, poverti, percent, neighborhood, cop 
## Topic 8 Top Words:
##       Highest Prob: polic, protest, offic, citi, floyd, minneapoli, georg 
##       FREX: night, chauvin, fire, saturday, minneapoli, downtown, curfew 
##       Lift: texa, chauvin, derek, saturday, precinct, downtown, friday 
##       Score: texa, curfew, saturday, night, monday, chauvin, brooklyn 
## Topic 9 Top Words:
##       Highest Prob: protest, right, civil, law, public, peac, bill 
##       FREX: bill, king, mask, civil, amend, spread, wear 
##       Lift: bill, amend, distanc, king, mask, resist, strategi 
##       Score: bill, mask, king, peac, moral, luther, civil 
## Topic 10 Top Words:
##       Highest Prob: protest, demonstr, violenc, report, violent, state, author 
##       FREX: demonstr, author, juli, violent, actor, report, event 
##       Lift: actor, conflict, oregon, juli, summer, trend, locat 
##       Score: actor, demonstr, portland, violent, trump, peac, oregon 
## Topic 11 Top Words:
##       Highest Prob: organ, communiti, help, support, work, justic, includ 
##       FREX: organ, provid, foundat, program, inform, resourc, help 
##       Lift: reli, foundat, platform, contact, provid, ensur, inform 
##       Score: reli, fund, foundat, program, contact, educ, organ
How topic distributions differ by Support/Oppose group?
prep <- estimateEffect(1: 11 ~ Support_Oppose, 
                       PrevFitQuery, meta = out$meta, 
                       uncertainty = "Global")
summary(prep, topics = 1:2)
## 
## Call:
## estimateEffect(formula = 1:11 ~ Support_Oppose, stmobj = PrevFitQuery, 
##     metadata = out$meta, uncertainty = "Global")
## 
## 
## Topic 1:
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.14534    0.01812   8.020 5.67e-14 ***
## Support_OpposeSupport -0.12992    0.02565  -5.065 8.49e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Topic 2:
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            0.06231    0.01557   4.003  8.5e-05 ***
## Support_OpposeSupport -0.01508    0.02298  -0.656    0.512    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(prep, covariate = "Support_Oppose", topics = c(1: 11), 
     model = PrevFitQuery, method = "difference", 
     cov.value1 = "Support", cov.value2 = "Oppose", 
     main = "Effect of Support vs. Oppose", 
     xlim = c(-0.5, 0.5), labeltype = "custom", cex = .5,
     custom.labels = c("1-Looting/Riot", "2-Police funding/budget", "3-Politics", "4-Police use of force", "5-Race equality", "6-Statue removal", "7-Crimes poverty", "8-Floyd/Minneapolis",
                       "9-Civil rights", "10-Protests/Demonstrati", "11-Donation/support"))

This shows that topics Race equality, Civil rights, Donation/support are significantly more prevalent in Support results, and topics Looting/Riot is more prevalent in Oppose results.

Topic correlation

topicCorr() permits correlations between topics, or how closely related topics are to one another (i.e., how likely they are to appear in the same document). This function requires igraph R package.

mod.out.corr <- topicCorr(PrevFitQuery)
plot(mod.out.corr)

Word embeddings:

In topic modeling, words are represented as frequencies across documents – each word has a vector of numeric values. Newer techniques such as word2vec and GloVe use neural net approaches to construct word vectors. Example: Paris - France + Germany = ??? .

Here, word vectors are created based on co-occurrences. See more: https://code.google.com/archive/p/word2vec/

First, we build a corpus with specific cleaning tasks as input to train a word2vec model.

library(word2vec)

## text cleaning specific for input to word2vec, which include conversion text oto ASCII, keep alphanumeric characters, removing leading/trailing spaces
x <- txt_clean_word2vec(corpus, ascii = TRUE, alpha = TRUE, tolower = TRUE, trim = TRUE)

Then, we train the word embeddings model with a set of parameters. With the model, we can begin to get either 1) the embedding of words, or 2) the nearest words which are similar to either a word or a word vector.

set.seed(23874)
model_cbow <- word2vec(x, dim=400,  iter=20) #continuous bag of words algorithm
#check similarity
nn_cbow <-  predict(model_cbow, "protest",  type = "nearest", top_n = 10)
nn_cbow
## $protest
##      term1          term2 similarity rank
## 1  protest  demonstration  0.7346968    1
## 2  protest          vigil  0.6791759    2
## 3  protest       assembly  0.6705053    3
## 4  protest          rally  0.6678751    4
## 5  protest       protests  0.6588770    5
## 6  protest demonstrations  0.6206497    6
## 7  protest        marched  0.5931678    7
## 8  protest          tense  0.5914532    8
## 9  protest     protesters  0.5859006    9
## 10 protest     protestors  0.5797508   10

We can also do some calculations with the vectors and find similar terms

emb <- as.matrix(model_cbow)
vectors <- emb[c("equality", "rights"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model_cbow, vectors, type = "nearest", top_n = 10)
## $equality
##            term similarity rank
## 1       achieve  0.7702916    1
## 2     achieving  0.7347823    2
## 3        blacks  0.6887189    3
## 4          slur  0.6571456    4
## 5  inequalities  0.6321969    5
## 6    inequality  0.6284207    6
## 7      achieved  0.6210301    7
## 8     injustice  0.6182012    8
## 9        divide  0.6167789    9
## 10   resentment  0.6037710   10
## 
## $rights
##            term similarity rank
## 1  disobedience  0.7972412    1
## 2     liberties  0.7582856    2
## 3    litigation  0.7581983    3
## 4        beings  0.7228526    4
## 5           war  0.6755592    5
## 6     disorders  0.6654155    6
## 7      servants  0.6421698    7
## 8  precipitated  0.6366475    8
## 9      disorder  0.6294204    9
## 10    liability  0.6121472   10
## 
## $avg
##            term similarity rank
## 1        rights  0.7549644    1
## 2      equality  0.7549641    2
## 3       achieve  0.6358227    3
## 4        beings  0.5996270    4
## 5  disobedience  0.5985212    5
## 6     achieving  0.5980842    6
## 7    litigation  0.5778762    7
## 8        blacks  0.5664884    8
## 9     liberties  0.5587466    9
## 10  consumption  0.5477144   10

Further readings/ References:

Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., … & Schmid-Petri, H. (2018). Applying LDA topic modeling in communication research: Toward a valid and reliable methodology. Communication Methods and Measures, 12(2-3), 93-118. Carina Jacobi, Wouter van Atteveldt & Kasper Welbers (2015): Quantitative analysis of large amounts of journalistic texts using topic modelling, Digital Journalism, DOI: 10.1080/21670811.2015.1093271 Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228-5235.link Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7-9), 1775-1781.link Arun, R., Suresh, V., Madhavan, C. V., & Murthy, M. N. (2010, June). On finding the natural number of topics with latent dirichlet allocation: Some observations. In Pacific-Asia conference on knowledge discovery and data mining (pp. 391-402). Springer, Berlin, Heidelberg.link Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61-84.link

Roberts, M.E., Stewart, B.M. Tingley, D. & Benoit, K. (2017) stm: Estimation of the Structural Topic Model. (https://cran.r-project.org/web/packages/stm/index.html)

Pretrained GLoVe: https://nlp.stanford.edu/projects/glove/