Caching Proxy: Introduction and Terminology

The caching proxy is a component that improves performance by caching results of method calls, typically remote service calls.

This article explains the basics of caching, TTL, invalidation, revocation and explains all the terms used throughout the documentation of the Caching Proxy. It is recommended you read (or at least scan) this page before continuing to the next sections.
Caching Proxy: Getting Started explains what you need to do in order to start using the Caching Proxy in your application, including how to use the Caching proxy with components other than the Service Proxy..
Service Discovery: Configuration includes caching-related configurable items, and how it looks in XML.

What is caching?

Caching allows frequently accessed data to be retrieved quickly, without having to request it from the original data source (e.g. remote service, database, or expensive computation). Instead of contacting the data source directly, you first try to retrieve it from the cache. If it's present in the cache (called a cache hit), you've managed to retrieve the required data while avoiding the expensive call to the data source. If it isn't there (called a cache miss), you contact the data source and then store the result in the cache so that future attempts result in a cache hit.

Cache key

Each item in the cache is stored under an identifier called the cache key. The cache key is also used to retrieve an item from the cache. A cache may only contain a single item for each key. You typically use a unique identifying property of the data cache key, so that you can retrieve it using only that identifier. For example, if your code needs customer with an ID of 12, you would use 12 as the cache key, and the customer data itself as the value that is stored. Sometimes the cache key may be more complex to create, because you need to store different flavors of the same entity, so you might create a composite cache key that is made up of several properties of the data (usually one or more identifiers with additional discriminator properties that differentiate the various flavors). Whichever method of creating the cache key is used, it must be repeatable, consistent and stable. If generating the cache key for the an item produces different keys in two different occasion, then you won't be able to retrieve the item from the cache since you're unable to create the original cache key used to store it.

Invalidation

Cache invalidation is the process of removing entries from the cache. There are two types of cache invalidation, one is defined within the cache itself - rules that keep the cache tidy, healthy and manageable. Cache invalidation of that variety is called eviction. The other type is when a decision is made to remove an item, but the decision occurred outside of the cache, either by your own code or by an external system. This type is called revocation. These two types of cache invalidation are explained below.

Eviction

After some time passes, cached data can become stale, which means it may represent old data that is no longer relevant and is different than what you'd get from the data source (data that is relevant is fresh). This can be caused by the data changing at the source (because of an action you or someone else performed), or it could be caused by the ephemeral nature of the data. The cache may also reach some limit imposed on it (e.g. a maximum number of items stored, or a maximum amount of memory consumed, i.e. when it's full) in which case you want to remove items from the cache in order make room for new items to be added. You typically want to remove cold data, which means that it is not frequently accessed (or at all), wasting the limited amount of memory you have to store items (frequently accessed data, on the other hand, is called hot data).

Eviction is the process of removing items because they're stale or cold, or because the cache is full. To deal with stale data, you define an absolute TTL (time-to-live) on an item inserted into the cache, which tells the cache when the item should be automatically evicted (e.g. one hour after being added). This prevents data from becoming stale, since you will have a cache miss if you try retrieve stale data, and so the data source will be used to retrieve a fresh version of it.

To deal with cold data (even if it's fresh), you define a sliding TTL, which is similar to the absolute TTL but resets every time there's a cache hit on that item. Thus, as long as a cache item is frequently accessed (i.e. stays hot), it will be kept in the cache. A cache item that is hot can also be stale because the frequent accesses keep it in the cache for too long, therefore it is desirable to have both an absolute TLL and a shorter sliding TTL.

Finally, when the cache is full you can either stop adding any new items to it (until some are evicted due being stale or cold), or you could forcefully evict hot and/or fresh data on the basis of either a simple FIFO (first-in-first-out) order, or a priority assigned to each item (when it was added to the cache), then evict items with the lowest priority first.

Since the addition of active cache revokes (see below), sliding TTL support was removed, so the Caching Proxy only supports absolute TTL.

Revocation

Sometimes you know when an item in the cache becomes stale, regardless of the amount of time it has been in the cache. You might have performed an operation that should change that item's data, or receive an external notification that such a modification has occurred elsewhere. At this point you know that the item in the cache is stale, holds irrelevant data, and so you want to revoke it (to remove it immediately). Actively revoking cached items has the advantage over TTLs (which is often an inaccurate approximation of when the item will turn stale) by removing items from the cache as soon as they're actually stale, and also not removing them prematurely because a TTL has elapsed. It means that if an item turns stale before the absolute TTL, revocation will prevent it from remaining in the cache even if the absolute TTL hasn't expired. This leads to a better utilization of the cache, reduced frequency of contacting the data source, and minimization of serving stale data from the cache. It makes an absolute TTL unnecessary since revocation handles staleness precisely and effectively. The downside is that revocation mechanisms are complicated and error prone, and this is the reason that for safety reasons, a very long absolute TTL is used even when there's a revocation system is in place, in case a revoke is missed for any reason.

Revoke key

While a cache key provides a way to retrieve a certain piece of data from the cache, the revoke key allows revoking one or more items from the cache. In the simple case, the cache key and revoke key are one and the same. If the cache key uniquely identifies an item in the cache then the same key could be used to remove it. If you store user data in the cache, when a user changes his shipping address, you could use the cache key (e.g. user ID) as the revoke key to remove the outdated user from your cache. However, if you hold product orders in your cache which contain the customer's shipping address, when the customer updates his shipping address you must revoke each and every one of his unfulfilled orders, so that when those orders are dispatched, they are shipped to the newly updated address. In this case, the cache key was the order ID, but the revoke key was the user ID.

Like in the example illustrated above, even though you can only have one item in the cache with a given cache key, you can have several items with the same revoke key. This is further complicated by the need to have multi-level revokes. If one cached entity affects other cached entities, which in turn affect other cached entities, then revoking a single entity can cause a whole collection (or tree) of entities to be revoked because of these relationships. There are two main ways of achieving multi-level revokes: cascading revokes and revoke key accumulation.

Cascading revokes

When using cascading revokes, each cached item is associated with one or more revoke keys that represent the entity or entities it depends on. When a revoke notification is received for a certain revoke key the item is revoked, but in addition a revoke notification is issued for that item's identity. This new notification may cause additional items to be revoked and additional notifications to be dispatched, and so forth. The advantages of this method is a simple data structure where each item only holds the revoke key of its parent. The means that for a given item in a tree that needs to be revoked, some number of revocation iterations must be completed until the item receives its own revocation notice (the revoke chain). The advantages are:

Low coupling: a cache item only needs to be associated with its direct dependencies. Revoke relationship between two services that don't communicate with each other work because the revoke chain is built implicitly.
Simplicity: each service only needs to be able to communicate revoke messages to services that depend on it directly.

This disadvantages of that are:

Revocation latency: the further down the item is in the tree, the longer the revocation chain is, the higher the latency.
Higher message volume: many notifications are generated because of the revocation chain
Fragility: if any of the messages down the chain aren't set or fail to process, all the items down the chain will fail to revoke.

Revoke key accumulation

A cached entity can be revoked not only when its direct parent entity is revoked, but when any entity in its hierarchy is revoked. This is done by accumulating and storing all of the revoke keys of all the item's dependencies for each item in the cache. When a notification for any single revoke key in the list is received, the cache item will be revoked. The advantages are:

Low-latency: each entity is revoked in response to the original revoke notification
Lower traffic: only a single revocation message is sent for each revoked entity
Durability: Each revocation is not inter-dependent on other revocations succeeding

The disadvantages are:

Complexity: The cache must maintain an exhaustive list of revoke keys for all dependencies in the item's hierarchy (which lead to increased memory usage).
High coupling: revocation of any entity is dependent on the availability of revocation notifications for all of its hierarchy, so each entity must carry all its accumulated revoke keys and pass it on to any derived revocable data.

The Caching Proxy uses revoke key accumulation.

Usage patterns

There are two main ways to use a cache from your code - explicitly or implicitly. Explicit use mean changing your code to incorporate a cache. Implicit use means caching is added without changing your code, or only with minor changes (e.g. metadata/attributes).

Explicit use

Explicit use of a cache means using some component that provides the features described above. The simplest class to use is a ConcurrentDictionary<TKey, TValue> as it provides the basic need to store and retrieve items by their cache keys, but that's about it. It has no way to deal with eviction (stale/cold data, memory limits) or revocation.

The MemoryCache class introduced in version 4.0 of the .NET Framework provides a more complete set of features. It supports both absolute and sliding TTL (but not both for the same item), limiting the cache memory footprint (including eviction priorities), revocation mechanisms with built-in support for automatic revocation notification from the file system and from SQL Server, and the ability to extend it to any revocation source (with support for cascading revokes). It is configurable through app.config.

These are examples of explicit caches. You add must add code to create the cache, create the cache key and the revoke key, add the item and keys to the cache, retrieve items from cache, remove items from it (in certain cases), and provide code to facilitate it to access your own data sources. You're clearly aware you're using a cache, and your code reflects it. This is the most common use of caches, it's easy to understand and use, very powerful and customizable, but also error prone. If you don't construct the keys correctly, if you forget to remove a cache item, if you access the data source directly for some reason, it will lead to either an under-performing application or an outright incorrect one.

Internally, the Caching Proxy uses a MemoryCache to store data. Nevertheless, it is not used explicitly.

Implicit use

Implicit use of caches require little or no modification of your code, or even being aware that a cache is used. A cache is automatically inserted between you and the data source, providing performance gains without introducing additional complexity in your application. A common example of that is the file-system cache. When requesting a file residing on a storage device (e.g. hard drive), the operating system will automatically cache the data read from the device, and the next time your application requests the same file, it will be returned from the cache. The existence or absence of the cache doesn't affect the semantics or functionality of your application, only its performance.

Memoization

One common pattern of using an implicit-use cache is called memoization (not memorization). Memoization caches the results of function calls, or method calls in object-oriented languages. It intercepts calls to methods and returns a result that's stored in the cache instead of running the actual method, in case of a cache hit. When there's a cache miss, it calls the original data source, the method itself. Every call to a method in every combination of different arguments to the method is stored separately, so you'll only get a cached result if you call the same method with the same arguments more than once.

Pure functions

Some programming languages (e.g. Common Lisp) support memoization natively, especially functional programming languages which require functions to be pure, which means they always return the same value given the same arguments, having no dependencies on external state which might change (such as the file system, network, computer clock, or any other input that's not in the function arguments), nor does it make any such changes (called side effects). Pure functions are the best candidates for memoization since their values are guaranteed to never become stale, as they don't change. This removes the need for absolute TTL and for any revocation mechanism, which makes memoization of pure function much simpler. Memoization can be done for non-pure functions but is more sensitive to staleness since you don't expect a function call to ever return stale data, especially if you aren't even aware that memoization is taking place. A revocation mechanism greatly reduces or eliminates staleness when memoizing non-pure functions.

In version 4.0 of the .NET Framework, the PureAttribute was added (as part of Code Contracts), which can decorate a method denoting that it is pure. The Caching Proxy currently does not detect this attribute, and so no special handling of it is performed (such as disabling TTLs).

Cache key in memoization

In implicitly used caches in general, and in memoization specifically, the user never constructs and provides a cache key. Instead, it is automatically generated. For file system cache, for example, the cache key is the full path and name of the file. For memoization, it is the combination of an identifier of the method (if method overloading is supported, then an identifier for that specific overload) in addition to the arguments that were passed to the method. Unlike with explicitly used caches, in memoization the cache key has no relation to the actual data being cached. The revoke keys, however, stay the same and must represent a way to revoke specific entities from the cache, therefore they usually identify the entity (or with multi-level revokes, the parent or hierarchy as well).

Cache corruption

When memoization is used in functional languages, it mostly stores immutable data (a data structure that once constructed cannot be modified), which is the only type of data structure that's usually allowed in most functional languages. In non-functional or multi-paradigm languages (such as C#), data structures are usually mutable unless specified otherwise. Immutable data is the best fit for memoization since there is no risk of modifying the data structure stored in it, therefore you are guaranteed to always get the exact data originally returned from the data source. But when the cache stores mutable data, specifically an object (where it has reference-type semantics), then you are able to modify the data that's stored in the cache. This modified data may then be returned in future calls to the same method and arguments, which is unexpected and can change the correctness of a program when memoization is introduced. This issue is called cache corruption, and there are a number of different ways to mitigate it.

The Caching Proxy does not currently implement any of the solutions described below, therefore it is susceptible to cache corruption. It is recommended you return only immutable types from cached methods. If that's not possible, you should take extra precaution not to modify objects returned from a cached method.

Enforcing immutability

Using this technique you only allow immutable data to be returned by memoized methods, just like in functional programming languages. This can be done either at runtime (e.g. using reflection to determine if an object returned by a memoized method is immutable and throwing an exception if it is), or at compile type using a language feature that only allows immutable type (C# doesn't have such a feature yet) or via static analysis (e.g. Roslyn analyzer). The downside is that any mutable type that is returned by a method that you want to memoize must be made immutable before it can be used, which requires code changes that implicit caching tries to avoid.

Cloning

A new copy of the cached data is returned for every cache hit. The original copy is kept in the cache and is never given to any method caller. Since each caller that hits the cache gets a separate copy of the data, modification of it are allowed since they cannot affect other callers. This provides isolation without requiring code modification. The disadvantage is that deep-cloning an object has a performance cost depending on the object size and complexity, but it's usually minor.

Change detection

This is an optimization of the cloning technique for the case where most of the code never changes the returned object. You still keep the original object, but make only one extra copy of it (instead of one copy per cache hit). When the cache is hit, before returning the second copy, you verify that it hasn't been modified (either by comparing it to the original or by using an interceptor such as a transparent proxy). If it hasn't been modified, you can return it as-is. If it has been modified, you discard the second copy and create a new one. This minimizes the amount of cloning (thus reducing allocations and garbage collector pressure), but still requires some form of change detection. Some ways of detecting changes are more expensive than others (such as deep-comparing object), but some of the more sophisticated ways have limitations (transparent proxies only work on interfaces or virtual members). In any case, there is some cost with change detection, but it can potentially be lower than with cloning.

Transparent Proxy

For languages that natively support memoization, the compiler or runtime itself intercepts the call and redirects it to a memoizer. In the case of C#, no language feature exists for such interception. Some compile-time tools exist that provide this missing feature (such as PostSharp) but they have significant disadvantages (mostly around tooling) that will not be detailed here. A more common way of intercepting methods is by using a transparent proxy, which is an object that sits between the caller and callee of a method. This is possible because .NET (and many other platforms) supports runtime code generation, which allows generating classes during the program's execution. If the caller is calling a method of an interface, he can be provided with a an instance of that interface which isn't the intended implementation, but rather the runtime-generated class that implements the same interface, the transparent proxy. It will in turn implement the interception logic, and will forward the call to the actual implementation if needed.

This is the method that both the Service Proxy and the Caching Proxy use to intercept calls (hence the Proxy part in their names).

Pros and Cons of Inter-Service Caching

Caching provides many benefits relating to performance and reliability, but also adds complexity that has certain disadvantages.

Pros

Increased throughput: The rate at which you retrieve an item from the cache is several orders of magnitude higher than from a remote service.
Lower latency: Retrieving an item from the cache is very fast, bypassing the processing time of the remote service, the network latency and serialization and deserialization of the data.
Decreased load on the system: Usually when you make a call to a remote service, that service has to perform some processing to fulfill that request. As part of the processing it might contact other services which would also spend time processing. Retrieving an item from the cache saves all that processing, and reduces network traffic.
Fault tolerance: Without caching, any downtime of a service will cause a rippling failure effect, where all other services that depend on it fail as well. When you have caching, some (ideally, most) of the calls end with the cache, so a failure of a cached service is not as catastrophic. In fact, the system might remain healthy for quite a while, allowing more breathing room to fix the issue.

Cons

Use of stale data: By using caching, you essential trade the certainty that you have the latest data for performance. For most applications, this is an acceptable compromise, but not for all. A common mitigation (yet complex to implement) is using cache revokes. But if you rely on revokes to keep the data in the cache fresh, and for any reason revokes are not working, you will use stale data.
Increased memory usage: The cache stores many responses that returned from other services. This storage consumes memory, which must be managed - you must set limits for the cache so that it doesn't grow without bounds (such as a memory limit, or a strict TTL). Also, because items in the cache remain there for some times, they are often promoted to higher-tier garbage collector heap generation, making them more costly to collect.
Possible cache corruption: As discussed in the Cache Corruption section, modifying an item that was retrieved from the cache would cause others callers to the same method and with the same arguments to see the modified data, essentially corrupting the data in the cache. This can be mitigated or eliminated entirely as described earlier in the document.
Lack of full distributed tracing data: When a call is served from the cache, it never reaches the remote service, which means it doesn't write tracing data about how it generated the result. The tracing data is written only for the call with the cache miss, which is unrelated. This makes distributed tracing harder, because at first glance it appears like certain tracing data is missing.
Inconsistent performance: There is a very large performance difference between calls that hit the cache and those that miss. When a cache item is invalidated, it immediately causes a spike in latency and a drop in throughput, because all requests that were served quickly from the cache now have to wait for a remote request to complete. This is often characterized by "latency spikes" in the period of the configured TTL. The invalidation of a cache item also hurts the fault tolerance described in the Pros section, since if the item is not available in the cache, any failure of a remote service will be felt immediately. This can be mitigated using various techniques described in Soft Invalidations.

The Caching Proxy help mitigate the use of stale data by supporting revocations, mitigates increased memory usage by allowing a memory limit to be imposed on the cache, and mitigates inconsistent performance by supporting soft invalidations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly