Hacker News new | past | comments | ask | show | jobs | submit login
Haskell’s State monad is not a monad (2014) (chalmers.se)
103 points by setra on July 2, 2018 | hide | past | favorite | 69 comments



This isn't about the state monad, it's about seq. "But now seq must distinguish ⊥ from λx → ⊥, so they cannot be equal. But they are eta-equivalent. We have lost eta equality!" - once you lose eta equality you lose everything. A whole raft of monads become impossible, not just state.

Equivalence in Haskell is defined only up to ⊥; to work in Haskell we rely on the "Fast and Loose Reasoning is Morally Correct" result that if two expressions are equivalent-up-to-⊥. and both non-⊥ then they are equivalent.

(Or, y'know, work in Idris with --total and avoid this problem entirely).


Just to agree and emphasize, a more accurate title would probably be "Breaking eta equality breaks the monad laws." A lot of equational reasoning in general goes out the door at that point, not just monads.

More generally any kind of system that is able to introspect and differentiate between non-equivalent evaluations of itself tends to break a lot of invariants.

Imagine trying to establish equivalences between sequential and concurrent processes if the processes in question could introspect what thread it was running on and changed behavior based on that.

So yeah. Seq is what breaks things here. Not State. But even Haskell sometimes has to compromise purity for practicality.


> even Haskell sometimes has to compromise purity for practicality.

Well, the designers of Haskell made reasonable decisions based on what was known at the time. I'd argue that Idris shows that we don't have to make this particular compromise any more (and, more generally, that the result of Haskell's lazy-by-default experiment is negative).


I've always thought that (Oca)ML got it right here: Strict by default, but full lazyiness easily available.


It's not so simple in my mind. Sure it sounds good on paper to just throw out laziness and go strict by default, until you realize how much you are giving up by not having laziness the default.

My difficulty with Haskell was not that it was lazy, but that it was hard to tell how something would be evaluated without understanding it's entire evaluation model.

I think you could have a lazy by default language that makes this a lot more transparent but I'm still thinking about how that might look.


Are you familiar with Python's approach to iteration? Obviously Python itself is a world away from Haskell, but its model of lazy and eager iteration is very pleasant to work with, and with Haskell-style typeclasses it would be dead-simple.


Absolutely, along with .NET's IEnumerable/LINQ.

Laziness is definitely possible in a strict language via thunks (similar to how Haskell does it behind the scenes). However I didn't realize what laziness by default makes possible until I spent considerable time in the Haskell ecosystem. It really is a game changer.


> Are you familiar with Python's approach to iteration? Obviously Python itself is a world away from Haskell, but its model of lazy and eager iteration is very pleasant to work with

That isn't my experience. As soon as you're in a chain of transformations you have no idea what's lazy or strict.


This is true, but I don't often see people writing code like this in the wild. That's what I mean about it being actually easy with static typing -- you can have the compiler and/or your IDE check for you whether it's a lazy sequence (like a Generator in Python) or not (like a Set, List, etc).


Most of the time it would be polymorphic over whether the sequence is lazy though. And the kind of code that wouldn't behave properly with a lazy sequence tends not to be detectable statically; you could maybe rely on types to convey whether the author intended their code to work on lazy sequences, but realistically that probably wouldn't help you much.


Wouldn't the return value of the previous function in the chain make it obvious?

E.g. in Python you can have

    def foo() -> Generator

    def bar(x: Iterable) -> Sequence

    def baz(y: Sequence) -> List

    lst = baz(bar(foo()))
It's obvious to an experienced Python user (and to a static type checker) what is lazy here. All you need now is a compiler to enforce it.


You don't necessarily know which functions are going to be chained together though. If I write (my Python's a bit rusty):

    def f(x):
      return map(someOtherFunction, map(someFunction, x))
how can I know whether someOtherFunction interferes with someFunction if they run at the same time? The only way to know whether the maps are strict or lazy is knowing what x is, but x might come from some a different file or a different project even, so there's no way to know whether this function only works (and is only expected to work) for strict x or not. Actually even knowing what x is isn't enough, because even if x is strict, f can end up being called in a lazy way if I later do:

    (y for x in someGeneratorThatGeneratesLists for y in f(x))


That's my point about a static type system. You don't have that problem if types are known at compile time.

I agree that it's a nuisance in Python as-is (but I still like it overall).


> That's my point about a static type system. You don't have that problem if types are known at compile time.

You probably do though, because people would write f to be polymorphic and then only test with strict collections.


Agreed, and I think that's where post-Haskell languages are going (I've already mentioned Idris).


Sorry, I'm not too knowledgeable about these things. What is the strict-versus-lazy dichotomy? I wouldn't have thought that they had to do with each other.


Strict being the opposite of lazy - expressions are executed immediately.

Like for instance, given an expression...

    x = foo(13)
In a strict language foo(13) will be evaluated immediately.

In a lazy evaluation model, x be 'the result of foo(13)', but foo(13) won't actually be evaluated until some expression forces evaluation, by, for instance, printing the value.

This is a gross simplication... lazy can do some really neat things like having functions that return infinite streams. The function would theoretically never terminate, but in practice would be called via something like...

    for event in infinite_stream()
        .. do stuff ..
        if event == TERMINAL_VALUE:
            break


Is strict a commonly used word for the opposite of lazy? I've always seen it as "eager", but maybe that's in the context of other programming languages.


It's commonly used that way, but it's a slightly informal use: being rigorous, "strict" and "non-strict" describe semantic models; whereas "eager" and "lazy" describe evaluation strategies with those semantics. Ie, eager evaluation leads to strict semantics, and lazy evaluation to non-strict semantics.


Is it side effects that creates the need for strictness?


I wouldn't say so, just efficiency. The cost of booking for laziness isn't free. While it can potentially pay dividends when you're able to avoid evaluation altogether, it isn't free.


I'd say it's the need to reason about evaluation, which might be because of side effects but might also be because of performance (of course that can be regarded as another side effect). Laziness has theoretical advantages but in practice we haven't been able to get away from having to understand how our programs are executed, even in Haskell.


Please speak for yourself regarding lazyness.


I would say 'by-demand lazy' failed.

The by-demand implementation gives a guarantee that an expression is never evaluated if it is not used. Other non-strict forms exist where every expression will be touched, but no guarantees are given on the ordering. MIT's pH Haskell dialect is an example of this.


I could never figure out how to parse something between evaluating inside out (strict) and outside in (lazy, well ignoring the complexity around multiple evaluations). Every informal definition I thought of seemed too weak to be useful. This was due to the core problem that generally speaking the coder needs to know if a particular expression will be evaluated using local reasoning (e.g. if I get here will `A` or `B` be evaluated) and without by-demand or strict you can't make that guarantee.

Taking a quick look at pH it looks like they haven't figured out the semantics quite yet so that isn't a true answer to the question either.

Have there been any other non-strict languages with well known semantics? (Excluding things like Prolog)


I always scares me how little I understand from comments talking about Haskell. Either I'm much more dumb that I though, or the whole community decided that purity beat practicality.


Eta-conversion is just the statement that having function(x) { f(x) }, in JS, is equivalent to having f. In short, wrapping a function in another function (or removing such a layer) makes no difference.

Like most of lambda calculus, this is extremely intuitive once described that way. Therefore, a situation for which it is false is disconcerting. It's also considered one of the "monad laws", which is the small set of rules that everyone assumes is true for every monad, but which the language doesn't (can't) enforce via the type system.

⊥, pronounced bottom, is a "value" that, if you ask for it, your program breaks. It may go into an infinite loop, or it may crash.

The GP post is therefore explaining that, because eta conversion in this case fails when the function returns bottom, we should be disconcerted. It's a mild form of disconcertion, however: Things would only go wonky if you try to run code that would, in any case, hang or crash.

The reason it isn't completely harmless is that the difference may be observable: A crash in pure code triggers an exception, which can be caught in its impure wrapper code, and a hang can in some cases be converted to an exception by the garbage collector. In theory all bottoms are equal, but in practice they aren't.


Thank you.


Quite the opposite; the community is very practicality-oriented. There's just a different style of naming things, because things in Haskell tend to be more general. But in most languages any frameworks will introduce their own terminology; comments about the details of (say) React observables are just as incomprehensible if you're not familiar with React.


Well, the facebook team decided to use a specific variation (monodirectional with no side effect) of MVC and rename it flux to confuse everybody. So I react is probably not a good software to measure yourself against when it comes to clarity.


It was a random choice of big-name framework; I could say the same of Rails or Spring or anything.

Naming things is hard, and the same pattern is constantly rediscovered in different places. If anything I'd say Haskell's difficulties come from the other end: being too willing to respect the "original" name of a concept from an old mathematical paper rather than coming up with a branding that programmers will be more comfortable with.


I get the point: we love to rename things, and Haskell is using the proper mathematical name. There must be a balance somewhere though.


One issue I have with ad hoc naming convention is that while they may make the first month of learning nice and easy they usually don't have the nice synthesizing effect where all of your notations come together beautifully.

Math is terse and there is an initial hurdle to get over. However, once that initial hurdle is cleared there are a lot of things that become obvious as a direct consequence of the notational bootstrapping that we did.

With that said, every time I start looking at Haskell I end up playing with category theory instead which may be indicative of something.


The parent seems to agree with you, that Haskell is maybe too eager to keep to 'mathematical' names, but it almost sounds like you interpreted the parent the other way around?

Your comment reads almost equally well with two different interpretations. One a restatement of fact, and the other a rather rude response to a misunderstood message.


There may not be a general balance. That would explain why so many languages scratch different people's itches.


You arent "dumb" if you don't known how to play some game or speak some language; you just haven't taken the time to learn it yet.


Yeah but I'm suppose to be a "game player" with 15 years of experience at this point :)


I've been playing the game of Go for 15 years. I barely know how to play Chess beyond moving the pieces. They are both board games played on a grid. I should be a pro at Chess based on my Go experience?

Haskell isn't completely different from your standard imperative programming language (functions, variables, etc), but it is definitely a different enough paradigm that you can't just pick it up in a day or two based on your existing experience. My biggest struggle when I picked it up 4 years ago was unlearning my imperative knowledge. Once I was able to throw away my initial preconceptions things got easier.

On the other hand, software engineering is still software engineering. Regardless of what tools you use, you usually have to solve them in a similar way. Your 15 years are still useful :)


That's actually the most encouraging comment about haskell i read in a while.


Yeah, but have you ever thought, "Y'know, maybe the fact that we can't reason in Hask but instead have to work in franken-Hask where not everything works as intended," and then thought, "Maybe this is a problem?"

Fast and loose reasoning is not morally correct; it's ethically bankrupt. Programmers in other languages don't get to forget about their errors, imperfections, and footguns; I'm not sure why Hask and Haskell should be let off the hook in our minds.

(Also, since somebody's gonna accuse me of not having read "Fast and Loose Reasoning", I absolutely have, and the biggest disappointment is failing to define useful equality on function values as if they were plain values. Might be impossible in Hask, but it's trivial in Pos, the category of posets; maybe Hask isn't a very good category!)


> Yeah, but have you ever thought, "Y'know, maybe the fact that we can't reason in Hask but instead have to work in franken-Hask where not everything works as intended," and then thought, "Maybe this is a problem?"

In some cases yes. Like I said in my original post, use Idris and then you don't have this particular instance of the problem.

In general though the possibility of failure seems unavoidable; even two runs of a byte-for-byte identical program might "produce different results" if someone dynamites the computer one of them is running on.

> Fast and loose reasoning is not morally correct; it's ethically bankrupt. Programmers in other languages don't get to forget about their errors, imperfections, and footguns; I'm not sure why Hask and Haskell should be let off the hook in our minds.

I don't think there's anything special about Haskell here. Programmers in plenty of languages reason as though exceptions/panics will never happen, and are sometimes surprised by their program's behaviour when they do.


This is all pretty fun reading.

At least with my own code in Haskell, I work in the mental model where I assume termination and strong normalization (all programs terminate under all evaluation strategies).

Then, Bottom corresponds to either an exception that will get forced or a computationally boring infinite loop (which the rts can sometime detect and turn into an exception). My personal style favors having those errors be very very visible.

That said, depending on what formal or informal reasoning properties you care about (especially as a library writer), these little subtleties can be really important!


This.

This thing is not a problem on Haskell because everybody always code defensively against bottoms. GHC is quick to teach this to novices.

It may still be a problem of "the lack of this feature inviabilises a lot of good code". It doesn't look like it is, but nobody can be certain, so it's a good avenue to explore.


The real monad was the friends we made along the way.


Cute, but why should one care about ⊥, aka bottom, aka nontermination [0]? Let's write programs that terminate...

[0] https://wiki.haskell.org/Bottom


We care deeply about ⊥ because in our terminating programs there are often sub-expressions that do not terminate. Without ⊥, we cannot reason effectively about Haskell.

You can require that all of your functions are total and still end up caring about ⊥.


Can you give a practical example of such nonterminating subexpressions?


It's kind of funny, looking at these halting discussions as a backend web programmer. At the heart of it, all I do all day is try to write nonterminating programs in an environment where the entire universe seems hell bent on terminating the event/polling loop. It's kind of perverse when you think about it.


The nice thing is that you can actually build stuff like web servers in a total language :) Using something called co-induction a total language can model things that are potentially infinite, e.g. a web server. Co-induction replaces the requirement that every function must terminate with the requirement that the next step (e.g. the next value of a stream) can be calculated in finite time. So, the event loop could be modelled using co-induction as long as long as the event handlers do not contain infinite loops.


why is that complementary instead of straightforward induction?


The style of coinductive reasoning is very different.

In general, in induction, one has a base case and an inductive case. Hence, you can only build "finite" objects (since you start with the base case and you can apply the induction rule a finite number of times)

In coinduction, you start with an infinite object, and you show that you can "progress" - that is, take a step in your computation. So, this shows that you can _reach_ states in finite time, even if the full structure is inaccessible, since you show progress on each state.

The phrase that I've heard is "induction is to prove what's true, coinduction is to prove what's not false"

Please do Google "practical coinduction" it has a wealth of examples.


The usual argument given for wanting programs to halt is that if you can't solve the halting problem for your language you can't prove anything, so we want languages where we can prove things.

But you don't have to have programs halt to solve that. It is a perfectly valid solution to the halting problem that all your event handlers halt and your mainloop doesn't halt (or only halts on receipt of a signal, and never on receipt of a request, or something). And then you have your ability to prove things back.


The web is completely transactional. You want nearly all your code to terminate (and fast), it's just some small layer that will call the terminating part that you want to run forever.

Thus, you will almost certainly gain by annotating the non-termination at the main loop type, and have the compiler tell you about both any possible bottom at the rest of your code, and any possible termination of the main loop.


Modern/future serverless paradigm models services as functions for example https://docs.datomic.com/cloud/ions/ions.html


That's why I love containers... you definitely write your code to not leak resources, but if there's any unexpected state, your recovery strategy is to report the error, bail, and the whole container is blown away.


You shouldn't need something as "big" as containers for that - ordinary user processes would work. (Erlang takes this model even further, to actors within a single unix process).


That's a nice in theory, but it's insufficiently imaginative. The unexpected state can be out of memory, out of disk, persistent network failures, etc.

And you need to be able mitigate bugs that make it to prod. Okay, your code never lets bugs into prod, but your dependencies may have issues, or you might have an unexpected conflict with the underlying OS. If you can limp along with a buggy configuration that's causing some containers to go down, you can often get a patch in place without having a total outage.


> That's a nice in theory, but it's insufficiently imaginative. The unexpected state can be out of memory, out of disk, persistent network failures, etc.

Well sure, but if that's your threat model then don't you need to restart whole VMs/machines rather than individual containers? I've always felt like containers fall between two stools (at least for languages/ecosystems that have a good way of packaging programs with dependencies - I appreciate that they make sense as a deployment solution for e.g. Python): their isolation guarantees aren't as reliable (at least in practice) as full VMs/unikernels, but you pay most of the price in terms of not being able to use normal IPC mechanisms.


> Well sure, but if that's your threat model then don't you need to restart whole VMs/machines rather than individual containers?

You sure do, but you pay someone else to deal with all that nonsense. And because you can share those VMs, it's cheaper.

> I appreciate that they make sense as a deployment solution for e.g. Python

Any language-specific solutions like .war files have no answer for binary dependencies or utilities provided by a package manager, and they're entirely right not to try and solve that problem. You also occasionally will get a third party that requires you run their awful utility... you may be seeing where my skepticism of "we'll just write good code" comes from.

The other advantage is your deployment artifact has a single identifier, so it gives you a much nicer story for rolling back a bad deployment.

> ... but you pay most of the price in terms of not being able to use normal IPC mechanisms.

If I had computing demands that warranted IPC via shared memory, I'd probably agree the overhead of containers is too much.


> You sure do, but you pay someone else to deal with all that nonsense.

I don't think I follow? I've got a class of problems where I might want to restart my application, and a class of problems where I might want to restart the VM/machine. I don't see how adding containers in between the two helps.

> Any language-specific solutions like .war files have no answer for binary dependencies or utilities provided by a package manager, and they're entirely right not to try and solve that problem.

JVM culture steers away from depending on binaries on the host system; in the rare cases where you want JNI then something like https://github.com/mrburrito/jni-loader is the standard approach IME. I think I've used literally two system libraries in my career, and both were standard enough that I was content to include them in the OS baseline. Appreciate that other languages do approach this differently.

> The other advantage is your deployment artifact has a single identifier, so it gives you a much nicer story for rolling back a bad deployment.

I don't see how this is different from any other kind of deployment?


If you have pairs of processes that communicate over IPC you could just build them into the same container


Proof of termination is closely related to :

"Now suppose there were a hypothetical language with a stronger guarantee: if two programs are equal then they generate identical executables. Such a language would be immune to abstraction: no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected."

http://www.haskellforall.com/2014/09/morte-intermediate-lang...

A more practical reason to care is that if a user-generated expression is proven to be pure and to terminate, then we can evaluate it without fear of getting hacked, denial of service, etc. Imagine a new type of iphone, IDE or web browser where a shitty app/plugin can't lag up your experience.


> A more practical reason to care is that if a user-generated expression is proven to be pure and to terminate, then we can evaluate it without fear of getting hacked, denial of service, etc. Imagine a new type of iphone, IDE or web browser where a shitty app/plugin can't lag up your experience.

Nontermination alone isn't enough, since in some provably-terminating languages it's still very possible to write expressions that will take a very long time to evaluate in some implementations. Concretely, "catastrophic backtracking" regular expressions can be true regular expressions (and therefore provably terminating), but in naive implementations their runtime will be exponential in the input length.

But it's a start, and the same techniques could hopefully lead to languages in which expressions naturally come with upper bounds on their evaluation time.


Some linear type systems garantee that functions defined in them have polynomial time complexity: http://www.cs.cmu.edu/~fp/courses/15816-s12/misc/hofmann03ic... http://www.kurims.kyoto-u.ac.jp/~terui/lalcjournal.pdf It appears however that the ones developed so far are awkward to "program" in.


You need more than termination to protect against a denial of service; you need to know that a function terminates promptly. Most denial-of-service attacks are about functions that run slowly (e.g. O(N^2) or exponential).


> no matter how many layers of indirection you might add the binary size and runtime performance would be unaffected

This is an unfortunate glaring false statement in the Morte documentation. Correctness of refactoring and code abstraction is understood to be with respect to semantic (extensional) equivalence, while normalization only works up to beta-eta equivalence. It can't eliminate abstraction in general.

Examples in Morte which suggest otherwise use hand-crafted forms of Church-encoded operations which are known to fuse, but there are many equivalent definitions which don't fuse, and which are actually more natural to write. Converting non-fusing definitions to fusing definitions is equivalent to supercompilation, which is generally undecidable, and thus can be implemented only up to some approximation.


Most functions we write are indeed total.


It seems to me a piece of software is either a formal proof or it's not. So if it's not and we still manage to write robust programs, the natural next question is, to what degree is it okay to knowingly violate the laws? And if you iterate that to the nth, do you end up with Clojure?


> It seems to me a piece of software is either a formal proof or it's not.

That seems like a false dichotomy. Rather I'd say: a program carries a bundle of proven properties with it, but for some programs this will be rich and for others it will be more or less trivial.

> So if it's not and we still manage to write robust programs, the natural next question is, to what degree is it okay to knowingly violate the laws?

To what degree do we actually write robust programs though? Errors are so common we dismiss them and retry or work around without even consciously noticing the error. "Have you tried turning it off and on again" is funny because it's true.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: