Performance is a key concern for engineers, as it directly impacts spending, user experience, scalability, and reliability. Initialization time falls within this spectrum when working with any Serverless environments.
When executed for the first time, or after a long break, a Serverless workload requires provisioning resources. This is what we call a cold start. Initialization duration, the time spent initializing code and runtime, is part of the cold start.
Here’s a great developer guide on how AWS defines cold starts for AWS Lambda.
A few months ago, I was tasked with reducing Datadog’s .NET tracing overhead. The first thing that came to my mind was what my colleague Rey Abolofia did for Python in:
Reducing AWS Lambda Cold Starts
Rey Abolofia for AWS Community Builders ・ May 24
So I thought, since .NET is a framework which requires C# to be compiled, there has to be a way to reduce the amount of work being done during runtime. After reading more about how .NET compilation works, I set out to compile our tracer ahead of time.
As a result, I achieved a 25% performance improvement in cold starts. Let me explain how I accomplished this.
.NET Compilation
Understanding how the .NET framework works is crucial, it will allow you to improve how your code is delivered. Many engineers overlook this aspect, mainly because they prioritize shipping code – but investing time in understanding how it works will pay dividends over time, as there are optimizations one can miss without this crucial knowledge.
Default Compilation
.NET applications are compiled into a language-agnostic Common Intermediate Language (CIL). Compiled code is stored in assemblies: files with a .dll
or .exe
file extension.
During runtime, the Common Language Runtime (CLR) is in charge of taking the assemblies and using a Just-In-Time (JIT) compiler to turn the Intermediate Language code into native code for the local machine to run. [1]
So, even though .NET applications are required to be compiled, there’s another compilation step during runtime – which requires compute power, and which subsequently translates into execution time.
There are two main techniques on how we can improve our Cold Starts, but the main idea behind them is Ahead-Of-Time (AOT) compilation. One is ReadyToRun, and the other one is Native AOT.
ReadyToRun
ReadyToRun (R2R) is a form of ahead-of-time (AOT) compilation. The binaries produced improve the startup performance by reducing the amount of work that the JIT compiler needs to do as our application loads. [2]
The main disadvantage is that R2R binaries are much larger because they contain both IL code and the native version of the same code.
Native AOT
Native AOT compilation produces an app that has been ahead-of-time compiled into native code for a specific architecture. Therefore, these applications will not use the JIT compiler during runtime. Not only will they have a faster startup time, but also a smaller memory footprint.
Another great advantage is that these binaries do not require the local machine to have the .NET runtime installed at all. Although a limitation is that you cannot cross-compile. [3]
Choosing a Compilation Strategy
The easy pick would be to compile with Native AOT all the time, right? Because it doesn’t require the .NET runtime, nor a JIT compiler. Unfortunately, there will be scenarios which you simply cannot do. [4]
For example, if you are doing dynamic loading, through Assembly.LoadFile
, or runtime code generation using reflection, with System.Reflection.Emit
, when compiling to Native AOT, you will find warnings during the process and your app will behave unexpectedly.
In my specific task, I couldn’t take advantage of Native AOT compilation because the Datadog .NET tracer uses dynamic loading and reflection. Due to the amount of required changes needed for this to work, I had to settle with R2R until we update the tracer.
How to?
R2R
To enable ReadyToRun compilation, simply add the following property in your .csproj
:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<!-- ...other properties -->
<PublishReadyToRun>true</PublishReadyToRun>
Native AOT
For Native AOT compilation, you can set the property, also in your .csproj
:
<PublishAot>true</PublishAot>
To ensure that your application is Native AOT compatible, you can set this property in the same file:
<IsAotCompatible>true</IsAotCompatible>
Benchmarks
To get data around cold starts, the methodology I used is simple: force a new sandbox for AWS Lambda every certain point in time, and emit telemetry by using an observability tool.
If you want a quick project to quick start and try it out for yourself, go to my example repository which uses the AWS CDK to benchmark a Hello World app with these strategies.
Hello World
For a simple AWS Lambda serializing an API Gateway HTTP payload and returning it, we see almost no benefits when using R2R, at around ~10ms
removed. But when compiling to Native AOT, we can see an improvement of 75%, with around ~400ms
being saved.
Due to lack of cross-compilation, I couldn't show the data for x86_64 for this test.
Datadog Tracer
For an immense codebase like the Datadog .NET tracer, publishing a release with ReadyToRun enabled improved positively the performance, as said before, a 25% cut during initialization.
This code is publicly available, feel free to check it out in DataDog/dd-trace-dotnet#5962.
[build] Build tracer with ReadyToRun #5962
Allows tracer publishing to be compiled with ReadyToRun to improve Serverless workloads init duration.
It has showcased a 500ms init duration improvement for AWS Lambda. Potentially could be used for other workloads in the future.
Followed #4573 and ReadyToRun docs.
- TBD
- Tested manually in AWS Lambda.
Increases tracer size by 3x.
Summary
In general, understanding how the compiler works will open a lot of doors for you to become a better engineer, and give you the foundational knowledge to think of ways to improve your applications’ performance.
The clear benefit of applying this in Serverless workloads is that your applications will be able to serve faster, and save money at the same time.
For more improvements, like stripping and trimming, I'd recommend deep diving into the referenced content and the AWS developer guide to compile .NET into Native AOT.
🇲🇽 This post is also available in Spanish in my personal blog
References
Thanks to Lucas Pimentel, who explained to me that this was possible.
[1] Microsoft. (2024). What is .NET Framework: Architecture of .NET Framework. Microsoft Learn.
[2] davidwrighton, gewarren, & Miskelly. (2022, June). ReadyToRun Compilation. Microsoft Learn.
[3] LakshanF et al. (2024, October 15). Native AOT Deployment. Microsoft Learn.
[4] stevewhims & mattwojo (2022, October). .NET Native and compilation. Microsoft Learn.
Top comments (0)