--- title: "Validation: parity with stm, and fit quality" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Validation: parity with stm, and fit quality} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} # Runs only when the stm package is available (it is a Suggests dependency); the # whole point of this article is to compare against live stm output. knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, message = FALSE, warning = FALSE, eval = requireNamespace("stm", quietly = TRUE)) set.seed(2138) ``` faSTM is a reimplementation of the Structural Topic Model, not a wrapper around `stm`. It has its own optimizer (a Rust variational EM), so two questions decide whether you can trust it: 1. **Given the same fitted model, do faSTM's post-fit numbers match `stm`'s?** 2. **Do faSTM's own fits reach the same quality as `stm`'s**, even though the topic decomposition differs? This article answers both by running `stm` live and comparing. Every check is guarded with `stopifnot()`, so this page fails to build if parity ever breaks. ```{r data} library(faSTM) library(stm) data(poliblog) out <- list(documents = poliblog$documents, vocab = poliblog$vocab, meta = poliblog$meta) ``` ## Same model, same numbers The fitted object is `stm`-shaped, so `stm`'s own readers run on a faSTM fit. We fit once with faSTM, then compute every inspection metric **both ways** (with faSTM's functions and with `stm`'s) on that single shared model. Identical inputs should give identical outputs. ```{r fit} fit <- faSTM::stm(out$documents, out$vocab, K = 20, prevalence = ~ rating + s(day), data = out$meta, init.type = "Spectral", seed = 2138, verbose = FALSE) ``` ### Topic labels (probability, FREX, lift, score) ```{r labels} fa <- faSTM::label_topics(fit, n = 7) sl <- stm::labelTopics(fit, n = 7) same <- vapply(c("prob", "frex", "lift", "score"), function(m) identical(unname(fa[[m]]), unname(sl[[m]])), logical(1)) same stopifnot(all(same)) ``` All four rankings return the identical words, topic by topic. ### Semantic coherence and exclusivity ```{r quality} coh_diff <- max(abs(faSTM::semantic_coherence(fit, M = 10) - stm::semanticCoherence(fit, documents = out$documents, M = 10))) excl_diff <- max(abs(faSTM::exclusivity(fit, M = 10) - stm::exclusivity(fit, M = 10))) c(max_coherence_diff = coh_diff, max_exclusivity_diff = excl_diff) stopifnot(coh_diff < 1e-8, excl_diff < 1e-8) ``` Both agree to floating-point precision: faSTM's `inspect.R` ports `stm`'s `semCoh1beta` / `js.estimate` / FREX formulas directly. ## Different fit, comparable quality faSTM's optimizer is not `stm`'s, and the STM objective is non-convex, so an independent run settles into its own optimum with its own topic numbering. The question is whether that optimum is as *good*. The fair, engine-neutral test is held-out predictive likelihood by document completion: hold out half the tokens in each document, fit on the rest, and score the held-out tokens. We run the same held-out set through both packages. ```{r heldout} ho <- stm::make.heldout(out$documents, out$vocab, seed = 2138) ff <- faSTM::stm(ho$documents, out$vocab, K = 20, prevalence = ~ rating + s(day), data = out$meta, init.type = "Spectral", seed = 2138, verbose = FALSE) sf <- stm::stm(ho$documents, out$vocab, K = 20, prevalence = ~ rating + s(day), data = out$meta, init.type = "Spectral", seed = 2138, verbose = FALSE) ll_faSTM <- mean(stm::eval.heldout(ff, ho$missing)$expected.heldout) ll_stm <- mean(stm::eval.heldout(sf, ho$missing)$expected.heldout) data.frame( engine = c("faSTM", "stm"), heldout_LL = round(c(ll_faSTM, ll_stm), 4), iterations = c(ff$convergence$its, length(sf$convergence$bound))) ``` ```{r heldout-check} rel_gap <- abs(ll_faSTM - ll_stm) / abs(ll_stm) round(100 * rel_gap, 3) # percent difference in held-out likelihood stopifnot(rel_gap < 0.02) # within 2% ``` The two fits land within a fraction of a percent on held-out likelihood, so the optima are of comparable quality. faSTM reaches it in more iterations, but each iteration is cheaper, so it still converges faster in wall-clock time (see the [Get started](faSTM.html) article). ## What this means for your analysis - **Post-fit numbers are safe to compare.** Labels, coherence, and exclusivity computed by faSTM equal `stm`'s for any given model. - **Fits are not identical, by design.** faSTM and `stm` find different (equally good) topic decompositions. For a result that must survive replication, fit your final, reported model in whichever package your readers will rerun, and report the package and version. This page is the evidence that either choice is sound.