-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Thunk-based runtime #21
base: main
Are you sure you want to change the base?
Conversation
02227a6
to
f013266
Compare
f013266
to
4e6b632
Compare
To help me understand what is going on with this slot-thunk trick, let me expand fn factorial(input: u64) -> u64 {
fn call_factorial_inner<'slot>(slot: &'slot mut slot::Slot, accumulator: u64, input: u64) -> trampoline::Action<'slot, u64> {
trampoline::call(slot, move |slot| {
if input == 0 {
return trampoline::done(slot, accumulator);
}
return call_factorial_inner(slot, accumulator * input, input - 1);
})
}
fn factorial_inner(accumulator: u64, input: u64) -> u64 {
trampoline::run(move |slot| call_factorial_inner(slot, accumulator, input))
}
factorial_inner(1, input)
} Intermediate inlining process.⇒ fn factorial(input: u64) -> u64 {
fn call_factorial_inner<'slot>(slot: &'slot mut slot::Slot, accumulator: u64, input: u64) -> trampoline::Action<'slot, u64> {
trampoline::call(slot, move |slot| {
if input == 0 {
return trampoline::done(slot, accumulator);
}
return call_factorial_inner(slot, accumulator * input, input - 1);
})
// ⇒
let fn_once = move |slot| {
if input == 0 {
return trampoline::done(slot, accumulator);
// ⇒
return Action::Done(accumulator);
}
return call_factorial_inner(slot, accumulator * input, input - 1);
};
trampoline::call(slot, fn_once);
// ⇒
Action::Call(Thunk::new_in(slot, fn_once));
// ⇒
let ptr = slot.put(fn_once);
// ⇒
let ptr = slot.cast().write(fn_once);
// ⇒
let slot_bytes: &mut MaybeUninit<_> =
unsafe { &mut *slot.bytes.as_mut_ptr().cast() };
let ptr = slot_bytes.write(fn_once);
Action::Call(Thunk { ptr })
}
fn factorial_inner(accumulator: u64, input: u64) -> u64 {
trampoline::run(move |slot| call_factorial_inner(slot, accumulator, input))
// ⇒
let slot = &mut Slot::new();
let mut action = (move |slot| call_factorial_inner(slot, accumulator, input))(slot);
// ⇒
let mut action = call_factorial_inner(slot, accumulator, input);
loop {
match action {
Action::Done(value) => return value,
Action::Call(thunk) => action = thunk.call(),
// ⇒
Action::Call(thunk) => {
let ptr: *mut dyn ThunkFn<'_, _> = thunk.ptr;
core::mem::forget(thunk);
action = unsafe { (*ptr).call_once_in_slot() };
// ⇒
action = unsafe {
let (fn_once, slot) = Slot::take(*ptr);
// ⇒
let in_slot: *mut _ = *ptr;
let slot: &mut Slot = &mut *in_slot.cast();
let fn_once = slot.cast().assume_init_read();
// ⇒
let slot_bytes_mut: &mut MaybeUninit<_> =
&mut *slot.bytes.as_mut_ptr().cast();
let fn_once = slot_bytes_mut.assume_init_read();
fn_once(slot)
};
},
}
}
}
factorial_inner(1, input)
} ⇒ fn factorial(input: u64) -> u64 {
fn call_factorial_inner<'slot>(slot: &'slot mut slot::Slot, accumulator: u64, input: u64) -> trampoline::Action<'slot, u64> {
let fn_once = move |slot| {
if input == 0 {
return Action::Done(accumulator);
}
return call_factorial_inner(slot, accumulator * input, input - 1);
};
let slot_bytes: &mut MaybeUninit<_> =
unsafe { &mut *slot.bytes.as_mut_ptr().cast() };
let ptr = slot_bytes.write(fn_once);
Action::Call(Thunk { ptr })
}
fn factorial_inner(accumulator: u64, input: u64) -> u64 {
let slot = &mut Slot::new();
let mut action = call_factorial_inner(slot, accumulator, input);
loop {
match action {
Action::Done(value) => return value,
Action::Call(thunk) => {
let ptr: *mut dyn ThunkFn<'_, _> = thunk.ptr;
core::mem::forget(thunk);
action = unsafe {
// This part does not really work because the type of
// `fn_once` cannot be written out.
// In the implementation, `dyn ThunkFn` is used to
// dynamically dispatch the call.
let in_slot: *mut _ = *ptr;
let slot: &mut Slot = &mut *in_slot.cast();
let slot_bytes_mut: &mut MaybeUninit<_> =
&mut *slot.bytes.as_mut_ptr().cast();
let fn_once = slot_bytes_mut.assume_init_read();
fn_once(slot)
};
},
}
}
}
factorial_inner(1, input)
} Code that compiles: SichangHe@d8570c4 If I try to explain this:
|
@SichangHe I've refactored a little and added some safety comments which should hopefully make this a little more clear. I need to add more documentation to the types but here is a brief summary:
Well, unsafe code is used, so this isn't fully borrow-checked. 😅 That said, I am fairly confident it is correct. Notice how in the code, there are never two This is why the closures are passed a
I don't think so, but I might not be fully understanding what you are saying. The information about the state of the trampoline has to be stored somewhere. In the current enum-based implementation, we write to
The dynamic dispatch is what allows for mutual tail calls! By performing type-erasure on the closures, we are able to make a trampoline that can run multiple functions without having to know the full list in advance. I think this will work even across different crates (provided they are using the same version of Mutual Recursion Example#[tailcall]
fn is_even(x: u128) -> bool {
if x > 0 {
tailcall::call! { is_odd(x - 1) }
} else {
true
}
}
#[tailcall]
fn is_odd(x: u128) -> bool {
if x > 0 {
tailcall::call! { is_even(x - 1) }
} else {
false
}
} Mutual Recursion Example (Expanded)use tailcall::{slot, trampoline};
fn is_even(x: u128) -> bool {
trampoline::run(move |slot| build_is_even_action(slot, x))
}
#[doc(hidden)]
#[inline(always)]
fn build_is_even_action<'slot>(slot: &'slot mut slot::Slot, x: u128) -> trampoline::Action<'slot, bool> {
trampoline::call(slot, move |slot| {
if x > 0 {
build_is_odd_action(slot, x - 1)
} else {
trampoline::done(slot, true)
}
})
}
fn is_odd(x: u128) -> bool {
trampoline::run(move |slot| build_is_odd_action(slot, x))
}
#[doc(hidden)]
#[inline(always)]
fn build_is_odd_action<'slot>(slot: &'slot mut slot::Slot, x: u128) -> trampoline::Action<'slot, bool> {
trampoline::call(slot, move |slot| {
if x > 0 {
build_is_even_action(slot, x - 1)
} else {
trampoline::done(slot, false)
}
})
}
I don't think so. We need to move the value out of the |
bd83c26
to
8c945fa
Compare
Okay… Learning about the borrow checker all the time.
Yes, I also think it follows ownership rules.
Okay, now I get it.
Most of my use cases of TCO in Rust have been text parsing or tree walking where I know all the cases, so the dynamic dispatch would be unnecessary pointer redirections there compared to enum-based implementations. I think the dynamic dispatch implementation is brilliant in its ability to perform tail call optimizations without knowing all the tail-call functions in advance. Though, I do not yet know what that would look like. Maybe we can find examples in loop-less languages like Erlang or Ocaml. The more important question is how the user would leverage this mechanism. I think most people would be scared away from the slot trick and some sort of macro is again needed. |
The closure is copied out of the
That is correct! I shouldn't have said "arguments" in my explanation. Note that for these closures though, the captures are always exactly the arguments of function it is representing.
Yes, this is a good point. Another downside of using the closures is that they will probably introduce one new stack frame above the trampoline loop itself. I would be very impressed if the compiler was able to inline the virtual calls.
This is another fair point. I like this virtual dispatch solution because it feels more flexible to me (cross crate mutual recursion), but that flexibility comes at a -- I think very small -- cost. That said, if no one needs that flexibility in practice, why bother paying it at all? (Especially when it requires adding On the other hand, an enum-based implementation would have much more complex macros, and the current enum-based solution for a single function already has a few bugs (see #13 , #18, and #19). It's not easy to take a functions arguments and represent them as enum variants in Rust, especially when generics or a receiver are involved (and remember all references are generic in at least their lifetime). 😵
I've gone over the code a few times, and the tests now pass on Miri, so I am relatively confident that the runtime is sound. My next step will be to rewrite the macros. 😄 |
6d65284
to
c63208d
Compare
78103eb
to
0fef54d
Compare
I got this one wrong. It is again a stack-copy. If the closure creates a new call stack, then this stack copy is unnecessary, otherwise it is.
This up to you. Though, I am fairly sure someone will use it if it exists 😆.
The current implementation is not really "enum-based", right? It is just a tuple. And, those are bugs in the macro 🤦?
✌️ |
8202a0e
to
445fd0c
Compare
445e495
to
dd4dfa5
Compare
a970259
to
bfe4e97
Compare
bfe4e97
to
53bff12
Compare
🚧 This PR currently only contains updates to the runtime system. The macros still need to be updated in order for it to work properly. 🚧
This branch is an attempt to address #3 and #20 with a new Thunk-based trampoline runtime. This solution has been explored in libraries like trampoline-rs; however, there it is implemented in a way that requires heap allocation.
In this runtime, the
Thunk
lives in aSlot
which is allocated on the stack and re-used for each subsequent call. This avoids the need for heap allocation, but comes with a few downsides:Slot
must be conservative (i.e. large enough to accommodate the data of the biggestThunk
).As mentioned above, the macros in the crate are not yet updated. To see an example of the runtime in action (what the macros would need to emit), see the
factorial_in_new_runtime
test case.At a high-level, usage of this crate under the new runtime would change to using two macros, one attribute to mark a function as available for TOC, and another at each call-site where one would want to use/guarantee TOC:
Macro Expansion Mockup
This is something that I am working on in my freetime, so it may take me a while to dig into rewriting the macros. In the meantime, I'm open to feedback, suggestions, and contributions from others who are interested in this problem.