Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

primops: make nature of foldl' strictness clearer #7158

Merged

Conversation

sternenseemann
Copy link
Member

@sternenseemann sternenseemann commented Oct 11, 2022

Edit: Outdated change description!

I was a bit confused by the fact that primop_foldlStrict only (shallowly) forces the input list. To determine what a strict left fold “should” do, I looked at Haskell's foldl' which calls seq on the accumulation value (which you can see by the z :: b type annotation in the arguably confusing code). I think this is fair enough, since Haskell is where most people's intuition about foldl' will come from.

Forcing the accumulation value vCur has the benefit that it limits how nested the intermediate thunks in a fold can become which hopefully saves on used stack size in larger folds (I done any testing on the impact so far). The list elements are only forced insofar, as it is necessary when forcing vCur which means we still save on unnecessary computations.

I added two new test cases which demonstrate that a) vCur is forced and b) list elements are not necessarily forced. Additionally I did a sanity check, evaluating a nontrivial CI pipeline as well as multiple NixOS configurations which did not turn up any obvious regressions.

@roberth
Copy link
Member

roberth commented Oct 11, 2022

It does behave correctly for all the other accumulator values.
We should test that.

nix-repl> builtins.foldl' (_acc: x: x) {} [ (throw "even though this accumulator value isn't used, it must be forced") 1 ]                
error: even though this accumulator value isn't used, it must be forced

For the purpose of illustration only, we can look at the lazy foldl in Nixpkgs lib:

nix-repl> lib.foldl (_acc: x: x) {} [ (throw "failure: because this accumulator value isn't used, it must not be forced!") 1 ]       
1

I don't believe laziness in the first accumulator value is going to be a performance issue in the real world, whereas making it strict may cause real world issues, potentially breaking old expressions. While I judge that chance of breaking to be slim, I weigh this to be a bad risk compared to the even slimmer strictness/performance benefit.
To align better with other languages, we could add lib.foldl' to Nixpkgs with the added strictness in the initial accumulator.

{
  foldl' = f: acc0: l: builtins.seq acc0 (builtins.foldl' f acc0 l);
}

In conclusion, I'd recommend to add the above test case instead and leave the implementation as is, and even add a test case so we don't break acc0 laziness in the future.

@sternenseemann
Copy link
Member Author

I did a simple benchmark using hyperfine and nixosTests.simple (the first commit is before this PR, the second commit is HEAD of my branch):

> hyperfine -w 5 -L commit ac0fb38e8a5a25a84fa17704bd31b453211263eb,aee7373357f20caaf14f02b72136b7950c55405d -s 'git checkout {commit} && make install'  './out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/'
Benchmark 1: ./out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/
  Time (mean ± σ):      2.360 s ±  0.052 s    [User: 2.232 s, System: 0.245 s]
  Range (min … max):    2.321 s …  2.493 s    10 runs
 
Benchmark 2: ./out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/
  Time (mean ± σ):      2.376 s ±  0.075 s    [User: 2.245 s, System: 0.241 s]
  Range (min … max):    2.329 s …  2.583 s    10 runs
 
Summary
  './out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/' ran
    1.01 ± 0.04 times faster than './out/bin/nix-instantiate --readonly-mode -A nixosTests.simple ~/src/nix/nixpkgs/'

So seems being lazier is slightly faster, but I'm not sure if it is significant enough to violate user's expectation.

@sternenseemann
Copy link
Member Author

I don't believe laziness in the first accumulator value is going to be a performance issue in the real world, whereas making it strict may cause real world issues, potentially breaking old expressions. While I judge that chance of breaking to be slim, I weigh this to be a bad risk compared to the even slimmer strictness/performance benefit.

This is also fair enough, we can just make it clearer in the documentation that this fold is lazy. Another argument for keeping it as is would be that implementing a strict fold based on the lazy builtin is easy, but not the other way round, potentially leaving us with a much worse performing lazy fold in nixpkgs.

@roberth
Copy link
Member

roberth commented Oct 11, 2022

slightly faster, but I'm not sure if it is significant enough

The measurement difference is within σ, so it is not significant.
Considering that the change only affects the first accumulator, and that repeated forcing of an already forced value short-circuits quickly, I would not expect a significant difference.

@roberth
Copy link
Member

roberth commented Oct 11, 2022

this fold is lazy

This is not a sufficient characterization of this fold. It is only lazy in the initial accumulator, and only if f does not make it strict.

Another argument

This gets a bit "academic" so to speak. I don't think we have much of a choice, because we don't want to break existing expressions.

@Profpatsch
Copy link
Member

Also note: lib.nix currently defines

  foldl = op: nul: list:
    let
      foldl' = n:
        if n == -1
        then nul
        else op (foldl' (n - 1)) (elemAt list n);
    in foldl' (length list - 1);

and

  foldl' = builtins.foldl' or foldl;

where foldl is not strict in any way.

@roberth
Copy link
Member

roberth commented Oct 11, 2022

  foldl' = builtins.foldl' or foldl;

where foldl is not strict in any way.

I believe this line to be a very old compatibility shim. Ideally it would wrap f to be strict in the accumulator, but that may also increase the stack requirements, making the stack overflows - that foldl' should solve - occur on even smaller inputs. Possibly not so ideal after all.

@Profpatsch
Copy link
Member

Of note: since attrsets force their keys in any case, adding the additional strictness to foldl' does not make a difference when your accumulator is an attrset.

nix-repl> x = { a = 1; ${throw "no"} = 2; }

nix-repl> builtins.seq x 1
error: no

@roberth
Copy link
Member

roberth commented Oct 11, 2022

It does if x is an otherwise ignored accumulator.

This isn't new information though. We should define weak head normal form aka WHNF in the Nix manual so that we can reference it and not feel like we have document things through sporadic github comments.

@Profpatsch
Copy link
Member

We should define weak head normal form aka WHNF in the Nix manual so that we can reference it and not feel like we have document things through sporadic github comments.

Not just that, but we should also document the strictness behavior of the builtin types (mostly lists and attrsets)

@Profpatsch
Copy link
Member

It does if x is an otherwise ignored accumulator.

But that’s a different case. Of course if you never force the thing in the first place (by ignoring it), it won’t be looked at.

@Profpatsch
Copy link
Member

Or maybe I don’t understand what you mean

@sternenseemann
Copy link
Member Author

sternenseemann commented Oct 12, 2022

Seems like the only way in which foldl' is strict currently is that it forces the final result thunk before returning it, but at no other point. This strikes me as odd, since the thunk would be forced anyways by whatever forcing the thunk of the foldl' function application?!

@roberth
Copy link
Member

roberth commented Oct 12, 2022

Seems like the only way in which foldl' is strict currently is that it forces the final result thunk before returning it, but at no other point.

It does force intermediate thunks, except the initial one that you pass to foldl' in its second argument.
See the observation in #7158 (comment)

This strikes me as odd, since the thunk would be forced anyways by whatever forcing the thunk of the foldl' function application?!

That would have been a sensible thought, but it doesn't apply to the linked observation.

@sternenseemann
Copy link
Member Author

sternenseemann commented Oct 12, 2022

It does force intermediate thunks, except the initial one that you pass to foldl' in its second argument.
See the observation in #7158 (comment)

This is not due to the final force, but seems due to the fact that we use callFunction in a strict way here (e.g. ExprCall::eval would call maybeThunk on the arguments which we don't do in foldl'). This is just my hypothesis, though, would be nice to get someone to confirm this who is more familiar with the evaluator code.

Edit: What I mean to say, using plain callFunction strictly calculates its return value, but not its arguments, so it wouldn't make a difference to move the force of vCur I introduced in this PR after the call.

Consider this diff:

diff --git a/src/libexpr/primops.cc b/src/libexpr/primops.cc
index 28b998474..3c2d589b3 100644
--- a/src/libexpr/primops.cc
+++ b/src/libexpr/primops.cc
@@ -2891,13 +2891,16 @@ static void prim_foldlStrict(EvalState & state, const PosIdx pos, Value * * args
         Value * vCur = args[1];
 
         for (auto [n, elem] : enumerate(args[2]->listItems())) {
             Value * vs []{vCur, elem};
             vCur = n == args[2]->listSize() - 1 ? &v : state.allocValue();
             state.callFunction(*args[0], 2, vs, *vCur, pos);
         }
-        state.forceValue(v, pos);
+        //state.forceValue(v, pos);
     } else {
-        state.forceValue(*args[1], pos);
+        //state.forceValue(*args[1], pos);
         v = *args[1];
     }
 }

We can still replicate your example:

nix-repl> builtins.foldl' (_acc: x: x) {} [ (throw "even though this accumulator value isn't used, it must be forced") 1 ]
error: even though this accumulator value isn't used, it must be forced

You can observe the difference by manually translating the expression into the equivalent calls (also with the above patch applied):

nix-repl> let op = _acc: x: x; in op (op {} (throw "foo")) 1 
1

Consequently, my originally proposed change hardly makes any difference in practice, though.

Edit: So I guess a way forward would be keep things as is, add more tests for this and try making the documentation a bit more accurate. Not sure if we can drop the final force before return, I have the suspicion it doesn't do anything?

@roberth
Copy link
Member

roberth commented Oct 12, 2022

builtins.foldl' (_acc: x: x) {} [ (throw "even though this accumulator value isn't used, it must be forced") 1 ]

This is actually a bit flawed. A foldl' that forces the list items instead of the accumulator would produce the same behavior.
A better test case is this:

# foldl' forces the accumulator values produced by the function. This must throw.
builtins.foldl' (_acc: item: item null) {} [ (_: throw "even though this accumulator value isn't used, it must be forced") (_: 1) ]

But an extra test case would make the same point

# foldl' is lazy in the list items when the function is lazy in the list item. This must return true.
builtins.foldl' (acc: item: acc) {} [ (throw "must be lazy in list items") ] == {}

Edit: So I guess a way forward would be keep things as is, add more tests for this and try making the documentation a bit more accurate.

sgtm

Not sure if we can drop the final force before return, I have the suspicion it doesn't do anything?

In theory the evaluator could apply a function and return a thunk for the body. I think this would check the set pattern, if any, but stop after that. I'd have to look up in the code what exactly happens though.

In general, I think not forcing a thunk may not always cause a problem because often there will be other opportunities for it to be forced anyway, but in a rare case it may not be. In other words, it's not easy to be sure.

PrimOpFun does not define any constraints in a comment, but that may not mean much. callFunction doesn't come with docs either, and I don't know this off the top of my head.

@sternenseemann
Copy link
Member Author

@roberth I updated this change to only update the documentation of foldl', clarifying it a bit. Three new test cases are added which should do a decent job of tying down the current behavior.

@sternenseemann sternenseemann changed the title primops: force accumulation value in foldl' primops: make nature of foldl' strictness clearer Oct 15, 2022
Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented test cases are much appreciated ❤️

evaluated first. For example, `foldl' (x: y: x + y) 0 [1 2 3]`
evaluates to 6.
...`. For example, `foldl' (x: y: x + y) 0 [1 2 3]` evaluates to 6.
The return value of each application of `op` is evaluated strictly,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier if the Nix docs defined weak head normal form for Nix. Lacking such a precise definition, perhaps we could call it the root?

A bit nitpicky perhaps, but to stay accurate, I'd recommend to avoid "evaluate strictly", because the Nix language does not switch between lazy and strict evaluation, as "strictly" might suggest.

Suggested change
The return value of each application of `op` is evaluated strictly,
The root of the return value of each application of `op` is evaluated immediately,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll use “immediately”, but I think root is too confusing. I'd much rather fix that in one go with seq and deepSeq. In the Nix manual currently evaluate always implies only the root, anything beyond that is evaluate deeply. With the removal of the “strictly” I think it is relatively clear in context.

Let's do the WHNF stuff in a separate PR.

src/libexpr/primops.cc Show resolved Hide resolved
tests/lang/eval-fail-foldlStrict-strict-op-application.nix Outdated Show resolved Hide resolved
* Clarify the documentation of foldl': That the arguments are forced
  before application (?) of `op` is necessarily true. What is important
  to stress is that we force every application of `op`, even when the
  value turns out to be unused.

* Move the example before the comment about strictness to make it less
  confusing: It is a general example and doesn't really showcase anything
  about foldl' strictness.

* Add test cases which nail down aspects of foldl' strictness:
  * The initial accumulator value is not forced unconditionally.
  * Applications of op are forced.
  * The list elements are not forced unconditionally.
@sternenseemann
Copy link
Member Author

Any blockers?

@roberth
Copy link
Member

roberth commented Jan 25, 2023

None as far as I'm concerned, but would like to discuss with the team before merging this.

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-02-10-nix-team-meeting-minutes-31/25438/1

@Ericson2314
Copy link
Member

This is approved, the conversation is safe to mostly ignore since the work ended up being much simpler. The tests are surely better than non-tests, and if we are unsure about the documentation part we should split out the tests to merge that first.

@fricklerhandwerk fricklerhandwerk merged commit dda83a5 into NixOS:master Feb 19, 2023
@sternenseemann sternenseemann deleted the foldl-strict-accumulation-value branch February 24, 2023 20:35
@infinisil
Copy link
Member

Haskell actually also had this problem originally, and it was only accidentally changed to the more intuitive behavior in 2015!

See also Which foldl'? and the corresponding comments on Reddit

@infinisil
Copy link
Member

Btw I had to work around this here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

7 participants