# Loom [![Build Status](https://travis-ci.org/posterior/loom.svg?branch=master)](https://travis-ci.org/posterior/loom) [![DOI](https://zenodo.org/badge/29212705.svg)](https://zenodo.org/badge/latestdoi/29212705) Loom is a streaming inference and query engine for the Cross-Categorization model [mansinghka2009cross, shafto2011probabilistic](/doc/references.bib). ### Data Types Loom learns models of sparse heterogeneous tabular data, with hundreds of features and millions of rows. Loom currently supports the following feature types and models: * boolean fields as Beta-Bernoulli * categorical fields with up to 256 values as Dirichlet-Discrete * unbounded categorical fields as Dirichlet-Process-Discrete * count fields as Gamma-Poisson * real fields as Normal-Inverse-Chi-Squared-Normal * sparse real fields as mixture of degenerate and dense real * text and keyword fields as booleans for word absence/presence * date fields as a combination of absolute, relative, and cyclic parts * optional fields as a boolean plus one of the above feature models See [input format docs](/doc/using.md#format) for details. ### Data Scale Loom targets tabular datasets of sizes 100-1000 columns 10^3-10^9 rows. To handle large datasets, loom implements subsample annealing [obermeyer2014scaling](/doc/references.bib) with an accelerating annealing schedule and adaptively turns off ineffective inference strategies. Loom's annealing schedule is tuned to learn 10^8 cell datasets in under an hour and 10^10 cell datasets in under a day (depending on feature type and sparsity).
Full Inference: Partial Inference: Greedy Inference: structure hyperparameters hyperparameters mixtures mixtures mixtures |-------------------> ------------------> ------------------> 1 many-passes ~10^4 accelerate 10^9 single-pass 10^4 row rows rows row/sec## Documentation * [Installing](/doc/installing.md) * [Quick Start](/doc/quickstart.md) * [Using Loom](/doc/using.md) * [Adapting Loom](/doc/adapting.md) * [Examples](/examples) ## Authors * Fritz Obermeyer