Pyro: PyTorch-Based Deep Universal Probabilistic Programming

ineedasername · on Nov 4, 2017

"Deep" probabilistic? A quick search & it seems the term "Deep Probabilistic" was coined by the package author. Which, hey, nice work and all, but "deep" in this context looks like pure marketing fluff.

Maybe it's not, I'm no longer cutting edge on this stuff-- my grad school days were a decade ago and the day job ::sigh:: doesn't require the more interesting stuff.

But I'm gonna get a bit "get off my lawn" on this one and say that, in my day (woohoo!), neural nets could be deep; they had hidden depths (& layers). Belief networks could be deep, and they were adding depth to learning too; Much of the "deep" stuff today seems to use that word the same way Tide & OxyClean have "deep" cleaning technology in their laundry detergent.

All of which is to say, this is a question from someone in the early stages of cruftiness, meant in good humor, to ask "What makes them there probablistics 'deep' ?" :)

mendeza · on Nov 4, 2017

I think the goal is to incorporate modeling uncertainty and utilizing powerful bayesian inference techniques with deep learning. I understand the terminology sounds gimmicky, but the techniques are very useful and used in current research. An example of research of utilizing probabilistic inference with deep learning is trying model the uncertainty of CNN by approximating the uncertainty distribution of weights in dropout layers. If interested, google work done by Alex Kendall and Yarin Gal.

fritzo · on Nov 4, 2017

Deep Probabilistic Programming

Dustin Tran, Matthew D. Hoffman, Rif A. Saurous, Eugene Brevdo, Kevin Murphy, David M. Blei

https://arxiv.org/abs/1701.03757

ineedasername · on Nov 5, 2017

That's very recent, and authored by d.tran, package maintainer for this project on github. Before that paper I see no other reference to it. Makes me think itwas coined in the paper and this package as part for trendy "deep" buzzword value. An impression bolstered by pyro's "about" section, which gives a vague generality and not much else.

This isn't a criticism of the work done which, on looking at the details, is useful and interesting. This is a comment on the difficulty of finding out what the actual interesting bits are.

I looked further: Uber's more detailed writeup has a statement for "why deep probablistic?", but it too is vague, about learning generative knowledge from data. And that a claim about generative knowledge is extremely ambitious for any learning method to make without still more explanation.

EDIT: I'll be honest, it was really this project's connection to Uber, and Uber's corporate culture, that prodded me to think a bit more about the project claims & terminology use. Maybe I'm being unfair.

fritzo · on Nov 5, 2017

Thanks for being honest :)

> ...difficulty of finding out what the actual interesting bits are. > Uber's more detailed writeup ... is vague

We indeed struggled to balance accessibility and depth that blog post. One place to go deeper is the Pyro tutorials http://pyro.ai/examples For more technical framing, take a look at Noah's paper [1]. Broadly I think the "interesting bits" are the integration of newer deep neural net methods with older Bayesian methods of flexible probabilistic modeling.

[1] Deep Amortized Inference for Probabilistic Programs Daniel Ritchie, Paul Horsfall, Noah D. Goodman https://arxiv.org/abs/1610.05735

ineedasername · on Nov 5, 2017

The paper clarifies a lot, and makes me think you meant something like "generative model" instead of "generative intelligence"

The problem with using the later is that it only rarely (maybe a couple dozen over the last few years) appears in conjunction with ML in rigorous research publications. When it does, it is used with extremely precise and well-defined parameters (see GPRS from M.Golumbic) or used in reference to the traditional "home" of generative intelligence at the cross roads of cognitive science and pedagogical theory & methods. There, it is a close cousin and necessary component of "general intelligence."

That's why its use here set off alarm bells of "Alert! Marketing speak at work! Gross overstatement of capabilities have been detected!" Because it doesn't fall much short of claiming "general intelligence" as a practical result of & use for Pyro.

juxtaposicion · on Nov 3, 2017

How does this compare to Edward, PyMC, Stan, et al? Is the primary distinction due to PyTorch’s imperative, dynamic programming?

fritzo · on Nov 3, 2017

Edward: like Edward, Pyro is a deep probabilistic programming language that focuses on variational inference but supports composable inference algorithms. Pyro aims to be more dynamic (by using PyTorch) and universal (allowing recursion).

PyMC, Stan: Pyro embraces deep neural nets and currently focuses on variational inference. Pyro doesn't do MCMC yet. Whereas Stan models are written in the Stan language, Pyro models are just python programs with pyro.sample() statements.

One unique feature of Pyro is the probabilistic effect library that we use to build inference algorithms: http://docs.pyro.ai/advanced.html

psinger · on Nov 5, 2017

This still doesn't really explain the dfifference to PyMC. What is the advantage of using Pyro over PyMC which supports a multitude of inference algorithms as well as mini-batch advi.

f00_ · on Nov 5, 2017

0.1.0 definitely not feature full, but Pyro seems promising.

PyMC3 is fine, but it uses Theano on the backend. Theano will stop being actively maintained in 1 year, and no future features in the mean time. That was announced about a month ago, it seems like a good opportunity to get out something that filled a niche: Probablistic Programming language in python backed by PyTorch. They are taking cues from edward and webppl, which from a casual glance seem to be the best libraries for python and javascript respectively http://edwardlib.org/ http://webppl.org/

But Edward is backed by TensorFlow

That announcement by Theano’s main developer Pascal Lamblin and Yoshua Bengio: https://syncedreview.com/2017/09/29/rip-theano/ https://groups.google.com/forum/#!topic/theano-users/7Poq8BZ...

"Dear users and developers,

After almost ten years of development, we have the regret to announce that we will put an end to our Theano development after the 1.0 release, which is due in the next few weeks. We will continue minimal maintenance to keep it working for one year, but we will stop actively implementing new features. Theano will continue to be available afterwards, as per our engagement towards open source software, but MILA does not commit to spend time on maintenance or support after that time frame. "

https://www.wired.com/2016/12/uber-buys-mysterious-startup-m...

Uber acquired Geometric Intelligence and renamed it Uber AI. From this article:

"But it hasn't published research or offered a product. What is has done is assemble a team of fifteen researchers who can be very useful to Uber, including Stanford professor Noah Goodman, who specializes in cognitive science and a field called probabilistic programming, and University of Wyoming's Jeff Clune, an expert in deep neural networks who has also explored robots that can "heal" themselves."

yodon · on Nov 3, 2017

Can someone eli5 probabilistic programming?

zitterbewegung · on Nov 3, 2017

You got this new kind of variable. It’s special because instead of holding an object or a number it holds the dice. When you evaluate this variable it gives you a dice roll.

To use it to solve real systems you create a model where some of the variables are dice and you shake the system until you get an answer that satisfies you.

obastani · on Nov 3, 2017

Consider the following (traditional) program:

  x = raw_input()
  x *= 2
  assert x >= 10

Now we can ask the question: "Assuming the assertion passed, what can we say about x?" In this simple example, we know that x >= 5, but in general the possible values of x may be much more complicated.

This is the kind of question probablistic programming is designed to answer, except instead of being an arbitrary, unknown value, x is specified as some distribution (say Gaussian). Then, the question becomes, "What is the posterior distribution of x, assuming all of the assertions pass?" In other words, figure out how likely different choices of x were, taking into account both the prior (i.e., that x is Gaussian) and the new information from the assertions.

For a simple example of why this is useful, suppose we have a program that generates random images (this is our prior). We also have some real photographs. We can assert that the random image generator should have a pretty good change of generating the real photographs. Then, the probablistic program "execution" will try and compute a new generator that creates more realistic images.

nightski · on Nov 3, 2017

I honestly don't think it would be possible for a 5 year old to learn or even grasp the concept of probabilistic programming. So maybe wait until high school or if very gifted middle school where you have a more solid mathematical foundation. It's the process of inferring the parameters of a Bayesian probabilistic model from data and then making predictions with that model.

A probabilistic programming language lets you express the model in code, and then it performs inference automatically using various methods. Note doing it manually may produce better results but it is a very involved and time consuming process.

geoelectric · on Nov 3, 2017

eli5 usually just means simplest terms that make sense for the answer.

unclesaamm · on Nov 3, 2017

My impressions after familiarizing myself with Stan, BUGS, JAGS, and a few Python libraries for Bayesian inference:

Probabilistic programming languages are designed to specify and evaluate statistical models. They usually sit somewhere between declarative and imperative languages. They're basically DSLs in the sense that they come with built-in primitives for various statistical distributions and linkage functions, and the model is sometimes but not always lazy evaluated, but they also often have procedural components like variable assignment and loops, and in some software like Stan, the model has to be specified in a certain order since it is evaluated procedurally.

The languages are also de facto coupled with the engines that run them -- I don't know of any probabilistic languages that have been formalized without a corresponding sampling engine, though there have been cases where a language is "forked", and separate engines built for the same language (like WinBUGS and OpenBUGS).

marmaduke · on Nov 3, 2017

I think Stan internally decouples the model from the algorithm. In other words, a model is just an implementation of an interface, providing log probability values (and gradient) for a given proposal. To wit it supports optimization in addition to sampling.

pinouchon · on Nov 3, 2017

This is not a simple explanation, but one I quite like (by Dan Roy) https://www.youtube.com/watch?v=TFXcVlKqPlM

alex_hirner · on Nov 4, 2017

It maps observations where you have a clue about how they were created (priors) to a stochastic function. So quite literally, you end up with a probabilistic program.

There are many solutions for each prior. Priors can for example be expressed as computational graphs with random variables.

tbenst · on Nov 3, 2017

How flexible is this compared to Church/Venture, Webppl, Anglican, etc? Does it support recursively-defined generative processes?

Edit: nvm, Noah Goodman is behind this, who created Webppl. This looks super flexible and awesome, congrats all!

nl · on Nov 3, 2017

The combination of probalistic programming and deep learning is pretty interesting to me because that's what I have going on in two of my work projects.

What we do is have features built using deep learning models, then use that extract simple linear or categorical features which we condition our probalistic model on.

We've found it quite hard to use very high numbers of variables in the probalistic model.

Has anyone found a better way of doing this?

_flbt · on Nov 4, 2017

Do you have a recommended resource on how to feature engineer with DNNs for use as inputs in other models?

nl · on Nov 4, 2017

I don't really understand the question.

For classification we just use the softmax over the classes. We had a long discussion about if this was close enough to probability to use and I think the conclusion was that it is.

orbifold · on Nov 3, 2017

It was bound to happen, the dynamic control flow of pytorch makes this really interesting compared to Edward.

catchmeifyoucan · on Nov 3, 2017

Blocked on our corporate network at CapitalOne as "suspicious"

Diederich · on Nov 3, 2017

Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling. It was designed with these key principles:

Universal: Pyro can represent any computable probability distribution. Scalable: Pyro scales to large data sets with little overhead. Minimal: Pyro is implemented with a small core of powerful, composable abstractions. Flexible: Pyro aims for automation when you want it, control when you need it.

Check out the blog post for more background or dive into the tutorials.

https://eng.uber.com/pyro/

yorwba · on Nov 3, 2017

Likely due to the Anguilla top-level domain. I think it's quite silly to block websites based on the country of their registrar, but unfortunately that kind of thinking is quite common.

traverseda · on Nov 3, 2017

How does it compare to Pyro, the python-remote-objects library?

almstimplmntd · on Nov 3, 2017

This post is about Pyro (probabilistic programming framework, with a clever logo "pi rho"), not to be confused with https://github.com/irmen/Pyro4

Completely independent projects, just a name collision.

pinouchon · on Nov 3, 2017

They are similar like java and javascript are similar

traverseda · on Nov 3, 2017

I mean, clearly they're a bit more similar than that.

l5870uoo9y · on Nov 3, 2017

Anyone have a qualified opinion on how Pytorch compares to Tensorflow?

lowpro · on Nov 3, 2017

Siraj Raval covers PyTorch very well in a 5 min video (https://youtu.be/nbJ-2G2GXL0).

It comes down to design decisions, of which I'm not qualified to go into. This article made the front page last week, about the downsides to Tensorflow which people rarely talk about: http://nicodjimenez.github.io/2017/10/08/tensorflow.html

And this interview with a Tensorflow engineer (10 mins) explains a little bit about those design decisions (https://youtu.be/axRHotkkTVI).

amelius · on Nov 3, 2017

I guess whatever qualities both platforms have does not really matter compared to how much traction they get. If the world converges to one platform, then you should probably go with that platform, no matter how good the other platform is. The reason is that things are just moving too fast, and you don't want to spend your time porting new research from one platform to the other.

That said, I'm curious why PyTorch (or specifically Autograd) couldn't have been built on top of TF.

newlyretired · on Nov 3, 2017

http://www.fast.ai/2017/09/08/introducing-pytorch-for-fastai...

steev · on Nov 3, 2017

I would roughly classify them based on use cases:

* If you are working on research such as optimization or other improvements on the algorithms of training neural networks, PyTorch is a better option as it is (in my experience) much more understandable and easily modified.

* If you are experimenting with network architectures and aren't going to be mucking around with the internals (e.g., developing a new optimization algorithm), Tensorflow is a better option.

polskibus · on Nov 3, 2017

Is there a tensorflow based equivalent?

julien_c · on Nov 3, 2017

Maybe http://edwardlib.org/

bj0 · on Nov 3, 2017

There's already a really cool python project called Pyro (python remote objects): https://pyro4.readthedocs.io/en/stable/intro.html

I haven't used it since I was in undergrad (>10 years) where I used it to communicate between nodes on a small cluster, but it made RPC really easy.

zero_iq · on Nov 3, 2017

Indeed. It was (is?) pretty well known, but I've not heard it mentioned for a long time. With all the fashionable modern RPC and serialisations around nowadays, perhaps the original Pyro is now obscure enough that the name can be reused? Ideally, though, it would be nice to know for sure that an existing project is considered obsolete before causing any confusion.

naturalgradient · on Nov 3, 2017

Or it is just Uber being Uber and not caring.