gsdmm

gsdmm implements short text classification via Dirichlet Mixture Models proposed by Yin and Wang 2014. It provides a fast c++ implementation and R interface for the Gibbs sampler described in the paper. Specifically, gsdmm implements the Likelihood function allowing for multiple occurrences of the same word in a given text (EQ4).

Benefits:

very space and time efficient
unlike LDA it requires only an upper bound on the number of clusters

Development:

I am planning to add a tuning function for the alpha and beta parameters of the gibbs sampler

Installation

You can install the development version of gsdmm from GitHub with:

# install.packages("devtools")
devtools::install_github("till-tietz/gsdmm")

Usage

Here is a minimal working example.

# we lemmatize and tokenize creating a list of character vector representing each text
text <- c(
  "Rockets are amazing.",
  "Witnessing a rocket in flight is a marvel of engineering.",
  "We should take a rocket to Mars.",
  "Rocket",
  "Have you ever seen a cat?",
  "Cats are fun.",
  "Your cat seems sweet.",
  "Cat"
) |>
  tolower()  |>
  gsub(pattern = '[[:punct:] ]+', replacement = ' ') |>
  textstem::lemmatize_strings() |>
  text2vec::word_tokenizer() |>
  lapply(function(i) i[!i %in% stopwords::stopwords()])


gsdmm::gsdmm(texts = text, n_iter = 100, n_clust = 20, alpha = 0.1, beta = 0.2)
#> [1]  4  4  6  6 18 18  2 18

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.Rproj.user		.Rproj.user
R		R
man		man
src		src
.Rbuildignore		.Rbuildignore
.Rhistory		.Rhistory
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
gsdmm.Rproj		gsdmm.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

gsdmm

Installation

Usage

About

Licenses found

Releases

Packages

Languages

License

Licenses found

till-tietz/gsdmm

Folders and files

Latest commit

History

Repository files navigation

gsdmm

Installation

Usage

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages