Zero cost abstractions: Rust vs C++

I watched a video of a C++ talk from the always excellent Chandler Carruth yesterday, entitled “There are no Zero Cost Abstractions“. In it, he talks about the cost of different abstractions that are labelled zero cost. It’s a really interesting talk and I’d advise people interested in low-level programming to give it a look.

He made an observation that C++’s std::unique_ptr is actually not a zero cost abstraction at runtime, and showed a simple C++ example using raw pointers, and then a semantically equivalent one using std::unique_ptr. Then he compared the assembly outputs taken from compiler explorer.

Now this was really interesting, and particularly because Rust boasts about having very similar zero cost abstractions. So I wondered about the cost of the equivalent code in Rust.

Fortunately compiler explorer supports Rust as well, so we can compare them!

So for reference, here are the C++ examples that Chandler included in his talk:

Raw pointer version:
void bar(int* ptr);
void baz(int *ptr);

void foo(int *ptr) {
    if (*ptr > 42) {
        bar(ptr);
        *ptr = 42;
    }
    baz(ptr);
}
With std::unique_ptr
#include <memory>

using namespace std;

void bar(int* ptr) noexcept;
void baz(unique_ptr<int> &&ptr) noexcept;

void foo(unique_ptr<int> ptr) noexcept {
    if (*ptr > 42) {
        bar(ptr.get());
        *ptr = 42;
    }
    baz(std::move(ptr));
}

And here are the x64 assembly language outputs. I’ve compiled using clang 10.0 and -O2 for optimisation.

As not everyone is familiar with the beauty of x64 assembly language, I’ve annotated the the output.

Raw pointer version:
        push    rbx                      # register preservation for the calling convention
        mov     rbx, rdi                  
        cmp     dword ptr [rdi], 43      # if (ptr > 42)
        jl      .LBB0_2                  # 
        mov     rdi, rbx                 # calling convention set up
        call    bar(int*)
        mov     dword ptr [rbx], 42      # *ptr = 42
.LBB0_2:
        mov     rdi, rbx                 # reverse out the register/stack preservation
        pop     rbx
        jmp     baz(int*)                # TAILCALL
With std::unique_ptr
        push    rbx                      # register preservation for the calling convention
        mov     rbx, rdi                
        mov     rdi, qword ptr [rdi]     # what's this? we are dereferencing the ptr in rdi!
        cmp     dword ptr [rdi], 43      # if (ptr > 42)
        jl      .LBB0_2
        call    bar(int*)                
        mov     rax, qword ptr [rbx]     # another dereference!
        mov     dword ptr [rax], 42      # *ptr = 42
.LBB0_2:
        mov     rdi, rbx                 # reverse out the register/stack preservation
        pop     rbx
        jmp     baz(unique_ptr<int>&&)   # TAILCALL

Now we have two extra fetches from memory in the std::unique_ptr version. I’m not going to go into detail for the reason behind them, as Chandler does this expertly in the talk with multiple revisions of the code, and I think you should all watch it, as the rest of the talk is highly relevant for even non C++ programmers.

Now std::unique_ptr<T> is equivalent to Rust’s Box<T>. Rust’s move semantics and ownership model enforce the same behaviour (actually even more strongly than C++ does).

So my first pass at a rust equivalent of the unique_ptr example was as follows:

#[inline(never)]
fn bar(ptr: &i32) {
    print!("{}", ptr);
}

#[inline(never)]
fn baz(ptr: Box<i32>) {
    print!("{}", ptr);
}

pub fn foo(mut ptr: Box<i32>) {
    if *ptr > 42 {
        bar(&*ptr);
        *ptr = 42;
    }
    baz(ptr)
}

In Rust, we can’t get away with a forward reference to a function in another translation unit, so we define bar and baz trivially, and mark them as non-inlinable, so that the compiler will generate calls to them instead of trying to be clever – we aren’t trying to check how the compiler inlines code and we want it to be a fair comparison.

So what does the generated output look like? Well slightly disappointing:

        push    r14                  # Lots of calling convention set up
        push    rbx
        push    rax
        mov     rbx, rdi
        cmp     dword ptr [rdi], 43  # if *ptr > 42
        jl      .LBB5_3
        mov     rdi, rbx             # calling convention set up
        call    example::bar
        mov     dword ptr [rbx], 42
.LBB5_3:
        mov     rdi, rbx             # Lots of calling convention unwind too
        add     rsp, 8
        pop     rbx
        pop     r14
        jmp     example::baz
        mov     r14, rax             # This is panic unwinding code.
        mov     rdi, rbx             # The jmp above means this is never reached!
        call    alloc::alloc::box_free
        mov     rdi, r14
        call    _Unwind_Resume@PLT
        ud2

Rust’s calling convention means there is a lot more setup. Of particular note was the
panic unwind. LLVM does a tail call optimisation, so instead of a call, it does it’s
cleanup and a jump. That means that when baz returns, it returns directly to foo’s caller
rather than foo. It’s a neat optimisation and saves on stack space and instruction calls.
However, that does mean that all the code after that is unused. This looks like a bug, albeit
not a serious one, either in LLVM or in the rustc backend.

So we can make the comparison more fair, by using the C calling convention in Rust to compare like for like. extern "C" functions need no panic unwinding (as C functions can’t panic) and the setup and tear down should be the same as for the C examples above, so this really would be a like for like comparison.

Here is the code:

#[inline(never)]
extern "C" fn bar(ptr: &i32) {
    print!("{}", ptr);
}

#[inline(never)]
extern "C" fn baz(ptr: Box<i32>) {
    print!("{}", ptr);
}

pub extern "C" fn foo(mut ptr: Box<i32>) {
    if *ptr > 42 {
        bar(&*ptr);
        *ptr = 42;
    }
    baz(ptr)
}

and the output:

        push    rbx
        mov     rbx, rdi
        cmp     dword ptr [rdi], 43
        jl      .LBB4_2
        mov     rdi, rbx
        call    example::bar
        mov     dword ptr [rbx], 42
.LBB4_2:
        mov     rdi, rbx
        pop     rbx
        jmp     example::baz

This is identical to the C raw pointer example. Rust’s compiler enforced semantics enable it to better optimise code without you even thinking about it. This is why I’ve come to love Rust – whilst C++ is too entrenched to ever disappear, I strongly believe that Rust is becoming a much better alternative to C++ in many areas where the only alternative to C++ was pure C. There is a long way to go yet, but it’s already very impressive.

As an aside, and true to Chandler’s talk – this is not zero cost, we pay for this performance at compile time. Rust is contending with C++ to be the slowest language to compile, although there is a lot of work going on to optimise it.

Anyway check out Rust and check out Chandler Carruth’s talk!

Edit: The comment on zero cost generated a bit of controversy on Reddit – and I think looking back it’s because it was poorly worded. I’m not attempting to hijack, change or rubbish the use of the term “zero cost abstraction”, merely observe that we pay for that performance elsewhere. With the power of hindsight I’d have written something like: “True to Chandler’s talk, and like all zero cost abstractions, this is only zero runtime cost – we pay for this performance at compile time.”

2 Replies to “Zero cost abstractions: Rust vs C++”

  1. Since I don’t see anyone mentioning it here, I’ll repeat and flesh out what was pointed out on Reddit. “this is not zero cost” is disingenuous.

    If you’re going by the intended meaning of the term, then compile time has no bearing on the “cost”. (And, for the record, the people steering C++ standards use “zero-overhead abstractions” these days, in order to head off another misunderstanding.)

    If you’re ignoring the intended meaning and going by total time spent, then you can’t simply calculate runtime plus wall time to compile because it’s unfair to not consider the non-compilation programmer time saved by the use of the abstractions.

    After all, you *are* jumping the border between stuff that happens on every execution and stuff that happens once per build.

    (Plus, if you want to stretch “zero-cost/overhead” like that, then it’s only fair to acknowledge that Rust’s compile-time guarantees make it more feasible to pursue the “theoretical, but never practical in a maintainable real-world codebase” performance numbers seen in benchmarks.)

    1. I’m pointing out that people often equate zero cost abstractions with free. Of course the cost in “zero cost abstractions” refers to runtime cost. But there are tradeoffs in other dimensions – most of the time they are negligible, but not always – this is what Chandler Carruth’s talk was about, and he puts the case far better than I do. In hindsight I didn’t word that sentence well, and I’ll edit the post to clarify.

Leave a Reply

Your email address will not be published. Required fields are marked *