Loom is a streaming inference and query engine for the Cross-Categorization model mansinghka2009cross, shafto2011probabilistic.
Loom learns models of tabular data, where hundreds of features are partially observed over millions of rows. Loom currently supports the following feature types and models:
- booleans as Beta-Bernoulli
- categoricals with up to 256 values as Dirichlet-Discrete
- unbounded categoricals as Dirichlet-Process-Discrete
- counts as Gamma-Poisson
- reals as Normal-Inverse-Chi-Squared-Normal
Loom targets tabular datasets of sizes 100-1000 columns 10^3-10^9 rows. To handle large datasets, loom implements subsample annealing obermeyer2014scaling with an accelerating annealing schedule and adaptively turns off ineffective inference strategies. Loom's annealing schedule is tuned to learn 10^6 row datasets in under an hour and 10^9 row datasets in under a day.
Full Inference: Partial Inference: Greedy Inference: structure hyperparameters hyperparameters mixtures mixtures mixtures |-------------------> ------------------> ------------------> 1 many-passes ~10^4 accelerate 10^9 single-pass 10^4 row rows rows row/sec
- Fritz Obermeyer https://github.com/fritzo
- Jonathan Glidden https://twitter.com/jhglidden
Loom is a streaming rewrite of the TARDIS engine developed by Eric Jonas https://twitter.com/stochastician at Prior Knowledge, Inc.
Loom relies heavily on Salesforce.com's distributions library.
Copyright (c) 2014 Salesforce.com, Inc. All rights reserved.
Licensed under the Revised BSD License. See LICENSE.txt for details.
The PreQL query interface is covered by US patents pending:
- Application No. 14/014,204
- Application No. 14/014,221
- Application No. 14/014,225
- Application No. 14/014,236
- Application No. 14/014,241
- Application No. 14/014,250
- Application No. 14/014,258
- numpy - BSD
- scipy - BSD
- simplejson - MIT
- google protobuf - Apache 2.0
- google perftools - New BSD
- parsable - MIT
- distributions - Revised BSD
- nose - LGPL