xvm introduction

Introduction

Imagine a system of execution. The ingredients are well known at a conceptual level: Instructions, conditionals, control flow, state, loads, and stores. Processors, virtual machines, languages: Each a tool to support the description of a desired outcome. Processes, threads, fibers, tasks, jobs: Each a picture of execution.

We create systems of executions as abstractions that exist between the definite world of digital processing and the indefinite and limitless world of programming. In the immortal words of Fred Brooks writing in The Mythical Man Month:

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures.

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

This document describes a system of execution, the XVM^[1]. It exists in the abstract as defined by this document, and much like similar technologies that have preceded it, it is expected that compliant implementations could be implemented in software, with execution efficiency provided by translation to the execution language of the underlying machine, whether it be hardware or software.

It is this simultaneous duality of purpose – to provide both a natural and immersive abstraction, and to maximize the potential efficiency of execution – that creates a dramatic tension in design. Most computer languages and their associated execution machinery are examples of the necessary trade-offs made to resolve this tension; very few examples exist without evident compromise. At one end of the spectrum are the hardware-centric models, starting with machine code, then assembly code, and progressing into the realm of increasing abstraction with languages such as FORTRAN, COBOL, and C; these programming languages allow the expression of intention using constructs that translate very directly to the mechanics of the underlying machine, thus simplifying the potential for optimization of execution in terms of both time and space. As one proceeds along the language spectrum, abstractions emerge that are less and less a reflection of the underlying machinery, and more and more the attempt to realize concepts of ideal algorithmic and structural composition. These abstraction-centric models include object-oriented languages such as Smalltalk, Java, and C#, and functional languages such as Haskell, Erlang, and Clojure; these programming languages provide powerful abstractions that allow designs and concepts to translate very directly to the mechanics of the language, optimizing for some combination of simplicity, brevity, readability, consistency, conceptual purity, predictability of execution, code re-usability, portability, speed of development, and so on.

There is nothing more crucial in understanding a system of execution than understanding how the designers of the system chose to resolve this natural tension. The two questions that must always be answered are these: (i) How does the system propose to map its capabilities to the hardware instruction set, the hardware memory model, other hardware capabilities, and the operating system(s) within which it will execute; and (ii) how does the system propose to represent its capabilities to the software developer?

To that end, the following observations and analyses are made, in an attempt to capture both the purpose and the rationale behind the decisions that are made to resolve this fundamental tension in this particular system of execution.

On Hierarchical Organization

Many a software developer has referenced this saying:

When the only tool that you have is a hammer, every problem begins to look like a nail.

That is not to imply that a hammer is not useful. If there is one conceptual hammer that – more so than any other – has repeatedly proven its merit for managing – and hiding – complexity, that would be the concept of hierarchy. File systems are hierarchies. B*Trees and binary trees and Patricia tries and parse trees are hierarchies. Most documents are internally organized as hierarchies, including the common HTML, XML, and JSON formats. Most graphical user interfaces are modeled as hierarchies. Many programming languages leverage hierarchy to provide nesting, information hiding, scoping, and identity. How is it that such a simple concept can be so universally useful?

First of all, hierarchical organization enables very simple navigation. What this means is that at any arbitrary point – called a node – in a hierarchy, there is a well-known set of operations that are possible, such as navigating from the current node to its parent node, and navigating from the current node to any of its child nodes. If a node does not have a parent, then it is the root node, and if a node does not have any child nodes, then it is a leaf node.

Child nodes are contained within their parent node. Each child is uniquely identifiable by its parent, for example by a name or some other unique attribute. A hierarchy is recursive; at any point in the hierarchy, from that point down is itself a hierarchy. Since a hierarchy has a single root and is recursive, each node in the hierarchy is uniquely identifiable in the hierarchy by combining the identities of each successive node starting with the root and proceeding down to the node; this identity is the absolute path to that node. It is possible to navigate between any two nodes in the same hierarchy by combining zero or more child-to-parent navigations with zero or more uniquely identifiable parent-to-child navigations; the sequence of these steps is a relative path between two nodes.

These basic attributes of a hierarchy combine in amazingly powerful ways. For example, since each node is itself the beginning of a hierarchy of any size, it is possible to refer to that entire sub-hierarchy simply by referring to that one particular node; this effectively hides the recursive complexity which is contained – or nested – within that node. As a result, it is possible to add, copy, move, or remove a hierarchy of any size simply by adding, copying, moving, or removing the node that is the “root” of that hierarchy.

Using a hierarchy, it is incredibly simple to construct the concept of scope. For example, a scope could include only a specific node, or it could include a specific node and all of its child nodes recursively to its descendant leaf nodes, or it could include a specific node and its ancestor nodes to the root node, or any other combination of inclusion and exclusion that could be described in an unambiguous manner.

These concepts are incredibly simple, yet at the same time incredibly powerful, and are leveraged liberally throughout the XVM, from managing and hiding complexity for developers, to managing memory in an execution context.

On Predictability vs. Performance

In the course of researching language design, one preterdominant concern emerges: That of execution performance. While many different concerns are evaluated and balanced against each other in a typical design, and while the goal of performance is often explicitly ranked in a position of lesser importance, in reality there is no one goal more considered, discussed, and pursued. Regardless of the importance assigned to performance as a language goal, performance to the language designer is a flame to a moth.

Perhaps it is simply best to admit, up front, that no language can survive – let alone succeed – without amazing feats of performance. Yet performance as a goal tends to conflict with other goals, particularly with respect to manageability, serviceability, and the quality of the abstractions that are presented to the developer.

Beginning programmers often ask: “Is language A faster than B?” After all, no one wants to be using a slow language, any more than someone would want to buy a slow automobile or a slow computer. Speed is a thrill, and speed in software execution holds no less attraction than a souped-up hot rod with a throaty growl and a body-rumbling ride.

The corollary to that question is “Why is language B slower than A?” The answers to this question tend to be very illuminating. Take any two languages that compile to and execute as native machine code, and compare their performance for a given task. Despite running on the same hardware, and using the same hardware instruction set, one may be dramatically slower than the other, by a factor of 10%, or 100% (half as fast), or even 1000% (an order of magnitude slower). How is such a difference possible?

The answer lies in translation, specifically in the automated translation of idioms. A language such as C selected idioms that closely represented the underlying machine instructions of the day, which allowed programmers to write simple programs that could be almost transliterated from C to machine code. In other words, the language chose as its abstractions the same set of abstractions that the CPU designers were using, or abstractions that were at most one translation removed from the machine’s abstractions. This allowed for very simple compilers, and made it extremely simple to support localized optimization.

A localized optimization is an optimization in the compiled code that can be made using only information about code that is most local to the code that is being optimized; in other words, information outside of the scope of the small amount of code being optimized is not even considered. Many optimizations in the C language, for example, are extremely local, such that they can be performed without any information about code outside of a particular expression or statement; it is hard to imagine more localized optimizations.

However, there is a trade-off implicit in achieving such simple and direct optimizations: The abstractions provided by the language are constrained by the abstractions provided by the CPU. As one could rightfully surmise, hardware abstractions tend not to be very abstract, and the abstractions provided by hardware instruction sets tend to be only slightly better. In its early days, the C language was jokingly referred to as “assembly with macros”, because as a language, it was only slightly higher level than assembly itself.

Computing efficiency is often stated in terms of a tradeoff between time (CPU) and space (memory); one can often utilize one in order to optimize for the other, subject to the law of diminishing returns. Unfortunately, there is no single simple metric that captures computing efficiency, but if the trade-off between time and space appeared as a graph with time on one axis and space on the other, it might generally resemble the shape of the curve y=1/x, which most closely approaches the origin at (1,1). If there were a single computing efficiency measurement for a programming language, it could arguably be represented by the closest that this trade-off curve approaches the origin (0,0), which distance could be considered the minimum weighted resource cost. To calculate a language’s efficiency for a particular purpose, one would calculate the inverse of the minimum weighted resource cost.

While one can easily speak about efficiency in hypothetical terms, a benchmark is a wonderful servant but a terrible master. The path chosen in the design of the XVM is to consciously avoid limits on potential efficiency by consciously avoiding contracts whose costs are not clearly defined and understood. This approach can be explained in the inverse, by way of example in existing languages and systems: Often, features and capabilities that were considered to be “easy” implementation-wise and “free” efficiency-wise when they were introduced, ultimately emerged as the nemesis to efficiency, due to the inflexibility of the programming contracts that these features and capabilities introduced^[2].

To understand this, it is important to think of abstractions as a two-sided coin: On one side, we see the benefit of the abstraction, which allows a programmer to work with ever-larger and more powerful building blocks, while the other side of the coin represents the cost of the abstraction, which is called the contract. Imagine, for example, having to build something in the physical world, out of actual matter. One could conceivably build a structure out of atoms themselves, assembling the necessary molecules and arranging them carefully into perfectly formed crystalline structures. The contracts of the various elements are fairly well understood, but yet we wouldn’t try to build a refrigerator out of individual atoms.

One could imagine that building from individual atoms is the equivalent of building software from individual machine instructions, in two different ways: First, that the refrigerator is composed of atoms, just like all software is executed at some level as machine instructions; and second, that as the size and complexity of the constituent components increase, the minutiae of the contracts of the sub-components must not be surfaced in the contracts of the resulting components – those details must be hidable! This purposeful prevention of the surfacing of minutiae is called encapsulation, and it exists as one of the cornerstones of software design. It is why one can use a refrigerator without knowing the number of turns of wire in the cooling pump’s motor, and why one can use a software library without worrying about its impact on the FLAGS register of the CPU.

Ultimately, it is the recursive composition of software that creates challenges for optimization. While low level optimizations are focused on the creation of more efficient low level code, higher level optimizations rely on explicit knowledge of what portions of a component’s contract – or the contracts of the various sub-components – can be safely ignored. In other words, the optimizer must be able to identify which contractual effects are ignored or discarded by the programmer, and then leverage that information to find alternative execution solutions whose contracts manage to cover at least the non-discarded and non-ignored contract requirements. Higher-level optimizations target the elimination of entire aspects of carefully defined behavioral contracts, and as a result, they typically require extensive information from across the entire software system; in other words, high-level optimizations tend to be non-localized to the extreme! No software has been more instrumental in illustrating this concept than Java’s Hotspot virtual machine, whose capabilities include the inlining of potentially polymorphic code by the determination that the potential for dynamic dispatch is precluded, and the elimination of specific memory barriers in multi-threaded programs as the result of escape analysis.

To enable these types of future optimizations, the contracts of the system’s building blocks must be explicit, predictable, and purposefully constrained, which is what was meant by the goal of “consciously avoiding contracts whose costs are not clearly defined and understood.” The contracts in the small must be encapsulatable in the large, which is to say that contracts must be composable in such a way that side-effects are not inadvertently exposed. It has been posited^[3] that “all non-trivial abstractions, to some degree, are leaky,” but each such leak is eventually and necessarily realized as a limit to systemic efficiency.

On God, Turtles, Balloons, and Sandboxes

Wikipedia defines a software sandbox as follows^[4]:

In computer security, a sandbox is a security mechanism for separating running programs. It is often used to execute untested or untrusted programs or code, possibly from unverified or untrusted third parties, suppliers, users or websites, without risking harm to the host machine or operating system. A sandbox typically provides a tightly controlled set of resources for guest programs to run in, such as scratch space on disk and memory. Network access, the ability to inspect the host system or read from input devices are usually disallowed or heavily restricted.

In the sense of providing a highly controlled environment, sandboxes may be seen as a specific example of virtualization. Sandboxing is frequently used to test unverified programs that may contain a virus or other malicious code, without allowing the software to harm the host device.

In the physical world, in which children play with sand, there are two common styles of sandbox. The first is constructed from four equally sized wooden planks, each stood on its long edge to form a square box, fastened in the corners, and then filled with sand. The second style is typified by a large green plastic turtle, whose “turtle shell” is the removable top that keeps the rain out, and whose “body” is the hollow bowl that keeps the sand in. Both styles hold sand and allow a child to dig tunnels and build sand-castles, but there is one major difference: When a child tunnels too deeply in the wooden-sided sandbox, the tunnel burrows past the sand and into the soil beneath, while the tunnel depth in the turtle sandbox is strictly limited by the plastic bowl.

Software sandboxes tend to mirror these physical types, in that the dirt often lies beneath. In other words, the sandbox attempts to protect the resources of the system, but a determined programmer will eventually be able to find a way through. The only way that a language runtime as a sandbox can ensure the protection of the underlying resources of a system is for the sandbox itself to completely lack the ability to access those resources. Thus, the purpose of the sandbox is to defend against privilege escalation:

Privilege escalation is the act of exploiting a bug, design flaw or configuration oversight in an operating system or software application to gain elevated access to resources that are normally protected from an application or user. The result is that an application with more privileges than intended by the application developer or system administrator can perform unauthorized actions^[5].

As a language runtime designer, it is not sufficient to simply distrust the application code itself; one must distrust the entire graph of code that is reachable by the application code, including all third party libraries, including the language's own published runtime libraries, and including any internal libraries that come with the runtime that are accessible. Or, put another way, if there is a possible attack vector that is reachable, it will eventually be exploited. To truly seal the bottom of the sandbox, it is necessary to disallow resource access through the sandbox altogether, and to enforce that limit via transitive closure.

But what good is a language that lacks the ability to work with disks, file systems, networks, and network services? Such a language would be fairly worthless. Ecstasy addresses this requirement by employing dependency injection, which is a form of inversion of control. To comprehend this, it is important to imagine the container functionality not as a sandbox, but as a balloon, and our own universe as the primordial example.

Like an inflated balloon, the universe defines both a boundary and a set of contents. The boundary is defined not so much by a location, but rather by its impermeability – much like the bottom of the green plastic turtle sandbox. In other words, the content of the universe is fixed^[6], and nothing from within can escape, and nothing from without can enter. From the point of view of someone within our actual universe, such as you the reader, there is no boundary to the universe, and the universe is seemingly infinite.

However, from outside of the universe, the balloon barrier is quite observable, as is the creation and destruction of the balloon. Religiously speaking, one plays the part of God when inflating a balloon, with complete control over what goes through that one small – and controllable – opening of the balloon.

It is this opening through which dependency injection of resources can occur. When an application needs access to a file system, for example, it supplicates the future creator of its universe by enumerating its requirement as part of its application definition. These requirements are collected by the compiler and stored in the resulting application binary; any attempt to create a container for the application will require a file system resource to be provided.

And there are two ways in which such a resource can be obtained. First of all, the resource is defined by its interface, so any implementation of that interface, such as a mock file system or a fully emulated – yet completely fake! – file system would do. The second way that the resource can be obtained is for the code that is creating the container to have asked for it in the same manner – to declare a dependency on that resource, and in doing so, force its own unknown creator to provide the filing system as an answer to prayer.

As the saying goes, it’s turtles all the way down. In this case, the outermost container to be created is the root of the container hierarchy, which means that if it requires a filing system, then the language runtime must inject something that provides the interface of a filing system, and that resource that is injected might even be a representation of the actual filing system available to the operating system process that is hosting the language runtime.

And here we have a seemingly obvious contradiction: What is the difference between a language that attempts to protect resources by hiding them at the bottom of a sandbox container, versus a language that provides access to those same resources by injecting them into a container? There are several differences, but let’s start with an obvious truth: Perfection in the design of security is difficult to achieve, and even harder to prove the correctness of, so it is important to understand that this design does not itself guarantee security. Rather, this design seeks to guarantee that only one opening in the balloon – and anything that is injected through that opening – needs to be protected, and the reason is self-evident: Transitive closure. By having nothing naturally occurring in the language runtime that represents an external resource, there is simply no surface area within the language runtime – other than the injected dependencies themselves – that is attackable.

Second, the separation of interface and implementation in the XVM means that the implementation of the resource is not visible within the container into which it is injected. While this pre-introduces a number of language and runtime concepts, the container implementation only makes visible the surface area of the resource injection interface – not of the implementation! This holds true even under inspection via reflection, and furthermore the injected resources are required to be either immutable, or completely independent services.

Third, this design precludes the possibility of native code within an Ecstasy application; native functionality can only exist outside of the outermost container and thus outside of the language runtime itself, and can only be exposed within the language runtime via a resource injected into a container, subject to all of the constraints already discussed.

Lastly, as has been described already, the functionality that is injected is completely within the control of the injector, allowing the requested functionality to be constrained in any arbitrary manner that the injector deems appropriate.

While it is possible to introduce security bugs via injection, the purpose of this design is to minimize the scope of potential security bugs to the design of the relatively small number of interfaces that will be supported for resource injection, and to the various injectable implementations of those interfaces.

On Processor Performance

There exists no shortage of opinions on the topic of what aspects are the most important in a system of execution. One voice will claim that only performance matters, while another will suggest that it no longer matters at all. One voice will claim that achieving efficiencies in development is far more valuable, while another will insist that predictability and stability in execution is critical. Opinions morph with time, as the reality of the physical units of execution evolves and the conceptual units of design are ever the more realized in languages and libraries.

Nonetheless the state of the art today bears the hallmark of a path followed far beyond its logical conclusion. In 1977, John Backus raised an early warning in his ACM Turing Award lecture:

Surely there must be a less primitive way of making big changes in the store than by pushing vast numbers of words back and forth through the von Neumann bottleneck. Not only is this tube a literal bottleneck for the data traffic of a problem, but, more importantly, it is an intellectual bottleneck that has kept us tied to word-at-a-time thinking instead of encouraging us to think in terms of the larger conceptual units of the task at hand. Thus programming is basically planning and detailing the enormous traffic of words through the von Neumann bottleneck, and much of that traffic concerns not significant data itself, but where to find it.

While programming advances have largely digested and expelled the explicit concerns of store-addressing and word-at-a-time thinking, these advances have been repetitively accreted onto a burial mound whose foundation remains a von Neumann architecture. Perhaps the success of that underlying architecture is the result of natural selection, or perhaps we have only inertia to blame. In any case, the evolution of concurrent multi-processing and distributed systems has stretched the von Neumann architecture past its effective limits. Specifically, it appears that the recent growth in the extent of the now automatically-managed store has occurred at a pace well beyond the increase in performance of the heart of the von Neumann machine: the processor. Whether this imbalance can be rectified by further technological accretion or by the adoption of a fundamentally new execution architecture is yet to be seen, but regardless: The inevitable and predictable increase in performance that has become the opiate of an industry has taken a sabbatical, and may have accepted an early retirement altogether.

There has existed a loose historic alignment in the growth of processor performance, memory capacity, memory throughput, durable storage capacity, durable storage throughput and network throughput. This relatively consistent growth has allowed a general model of assumptions to be perpetuated throughout hardware architectures, operating systems, programming languages and the various resulting systems of execution. Now we find that model to be threatened by the failed assumption that processor performance will increase at a rapid and relatively predictable rate.

To maintain the façade of progress, explicit hardware parallelism has emerged as the dominant trend in increasing processor throughput. Symmetric Multi-Processing (SMP) has a relatively long history in multi-CPU systems, but adoption of those systems was hobbled both by high prices and a lack of general software support. In the 1990s, the advent of the World Wide Web propelled multi-CPU systems into the mainstream for servers, but it is the recent, seemingly instantaneous and near-universal commoditization of multi-core CPUs that has finalized the dramatic shift from a focus on processor performance to a focus on processor parallelism. Further compounding the adoption of multi-CPU and multi-core systems are various technologies for Concurrent Multi-Threading (CMT), which enables a single CPU core to execute multiple threads concurrently. In aggregate, since the turn of the millennium, parallelism has increased from one to dozens of concurrently executing threads in an entry-level server, while the performance of an individual processing unit has only increased by only a few times. Looking forward, processor performance is now expected to improve only incrementally, while the level of parallelism is continuing on an exponential curve.

Since overall processing throughput has continued to increase at a dramatic pace not dissimilar from its historic trend, this shift from performance to parallelism could be safely ignored but for one problem: The von Neumann architecture is bound to a individual processing unit, and thus has nearly halted its forward progress in terms of the throughput of a single thread of execution. This means that for the first time in software history, existing programs do not run significantly faster on newer generations of hardware unless they were built to explicitly take advantage of thread parallelism, which is to say unless they were built assuming their execution would occur on multiple von Neumann machines in parallel. Since the art of programming is expressed almost entirely in imperative terms, and since the imperative nature of programming languages is based on the von Neumann architecture, we have managed to accumulate generations of programs and programmers that are hard-wired to a model that has at least temporarily halted its forward progress.

As a result, computing devices are providing increases in processing throughput that can only be consumed by parallelism. It is obvious that this mandates support for parallelism in any new system of execution, but there is a far less obvious implication of critical importance. Parallelism increases throughput only to the extent that coordination is not required among the threads of execution, and coordination is required only for mutable resources that have the potential to be shared across multiple threads of execution. In common modern systems of execution such as Java and C#, explicit parallelism is provided by threads of execution, each representing the state of a single von Neumann machine, but those machines collectively share a single store. Compounding the coordination overhead for the store is the prevalence of automatic management of the store, referred to as Garbage Collection (GC), which unavoidably requires some level of coordination. While GC algorithms have advanced dramatically in terms of parallelism, the remaining non-parallelized (~~and possibly non-parallelizable~~^[7]) portion of GC is executed as if by a single thread, which is to say that the runtime behavior of these systems will periodically halt in order to garbage-collect the shared store. The unavoidable conclusion is that growth in the shared store without the corresponding increase in processor performance has led to unavoidable and growing pauses in the execution of the parallelized von Neumann machines.

A series of advances in GC algorithms have thus far masked this inevitable consequence, but the advances are already showing diminishing returns, while the upward pressure on the size of the store has not abated and the dramatic progress of processor performance has not resumed. Ultimately, a single shared mutable store must be avoided, and the design of a runtime system must reflect that requirement. The XVM design is intended to fully address this challenge, and does so by decomposing the problem space along a number of naturally occurring fault lines. First, the XVM organizes scopes of linking, loading, and execution – referred to as containers – in a hierarchical manner; second, within that hierarchy, scope of execution is further localized into individual, conceptually-independent von Neumann machines – referred to as services – within which all allocations occur; and third, that only immutable state can escape the execution scope of a service.

These decisions allow dynamic memory allocation to be managed within (scoped to) a particular service, and the resulting garbage collection to be performed entirely within the context of individual services, without requiring the coordination of other services or containers. The two exceptions to this are escaped immutable data, and the reclamation of services and containers themselves. In the case of escaped immutable data, and precisely because the data (the objects) are immutable, the memory can be garbage collected without any coordinated halt of execution^[8]. Furthermore, the escaped immutable data can be organized within the container hierarchy at the level to which the data has escaped, or alternatively can be managed in a single global immutable store.

In the case of containers and services, each mark phase of each actively executing service also marks any services that it in turn can reach, again without any coordinated halt of execution; unreachable services are then collected asynchronously.

The concept of localizing allocations in order to localize GC is not new. Systems built around an explicit threading model have employed escape analysis in order to determine which allocations can safely be performed using a thread local allocator (such as a slab allocator), and which allocations need to be made from a shared store. This represents a hierarchy with a fixed depth of two: Global and thread local. While a dramatic improvement over a single shared store, it still implies a global stoppage for GC execution of the shared store.

The primary benefit to GC of the localization of the store is that a significant portion of overall GC execution can be localized entirely within each thread of execution, and further localized within the bounds of a particular service. Additionally, having stores that are localized to each service enables the system to exactly measure and meter – in real time – the amount of memory that is consumed by each service, and in aggregate by each container. Lastly, a range of optimizations are available to the data managed in a thread-localized store: the memory containing the data can be allocated, accessed, manipulated, and freed efficiently without any hardware-level concurrency control, and generated native code can be optimized specifically for cache locality.

Second, by organizing memory hierarchically in a manner corresponding the runtime container model, a service or an entire container can be discarded in its entirety, because no objects outside of that hierarchical scope can have a reference into the memory managed within that scope. In other words, the cost of “yanking” an entire sub-portion of the hierarchical runtime system is nominal, and the results are deterministic.

An additional benefit is that machine code optimized for single-threaded execution has dramatic performance advantages compared to machine code that is concurrency-safe, even when concurrency control is optimized using the latest hardware capabilities such as compare-and-swap (CAS) and other “lockless” instructions. Mutable data structures that are localized to a single thread of execution allow the resulting execution to approach the theoretical maximum efficiency of the underlying hardware.

The concept of GC optimizations based on immutability is also not new. Several GC implementations have leveraged memory protection barriers to protect memory regions being compacted as if the data were immutable, allowing application execution to proceed (in the absence of a fault) with the GC operating concurrently. Significantly, explicitly immutable data can be compacted without protection, because both the old (pre-compaction) and new (post-compaction) copies of the data are valid – being identical! – thus enabling the application to continue to execute concurrently and correctly while referring to either copy arbitrarily, deferring the housekeeping task of updating any pointers that point to the lingering old copy, and deferring the recycling of the memory region that was compacted. As an added benefit, the GC of regions of data known to be immutable can be performed by any thread of execution, or even a separate thread dedicated to GC.

On Immutability

In an object-oriented system, immutability refers to the prohibition to alter the state of an object. It turns out that many data types are naturally immutable; consider the number 42 for example^[9] – it is always the number 42! Other data types are naturally mutable; consider the concept of a variable for example – its very purpose is to be able to vary! Many data types are naturally immutable, and even with mutable data types, it is often desirable to be able to make specific instances immutable on demand.

The XVM explicitly supports immutability. Immutability has several benefits, notably: Predictability, thread/concurrency safety, security, and available optimizations. Predictability is one of the greatest benefits of good design, and immutability supports predictability by providing data types that are truly constant – at runtime! For example, when an object exposes its state, it often does so by exposing immutable data types so that its internal state cannot be directly altered; without explicit support for immutability, one of two things could occur: Either the mutable state of the object would be exposed, breaking encapsulation, or a copy (or other representation) of the mutable state would need to be created on demand to safely expose that state, which is expensive in terms of both space and time, not to mention complexity. Immutability provides a simple way to ensure that the state of an object cannot change, addressing each of these concerns.

With respect to thread and concurrency safety, an immutable object provides the same state regardless of the concurrency of access to the object, because changes to the state of the object are explicitly prohibited. It is precisely because of this explicit contract that the only state that can be visible to more than one thread of execution in the XVM is immutable state; immutable objects can be safely used without concurrency control and without relying on memory barriers.

Using immutability for security is powerful, but it is important to understand that security as a topic is simply another facet of predictability, and as a result, the same concepts and conclusions apply. Specifically, when immutability is an intrinsic axiom of a system, and thus cannot be circumvented, it becomes a powerful tool for reasoning about how aspects of the system will operate, and those assumptions become trusted building blocks for building secure software.

Regarding optimizations, the explicit knowledge of immutability conveys a number of significant advantages to a system of execution. As described previously, for example, explicitly immutable data enables a number of potential optimizations for the purpose of garbage collection, and immutability obviates the need for the types of concurrency control used with mutable objects. Immutability also supports both pass-by-reference and pass-by-value semantics transparently, because the underlying value is itself immutable, and thus the substitution of any duplicate copy of the value has no behavioral consequence. (Put another way: In the XVM, one cannot determine whether an object is being passed by reference or by value if that object is immutable.)

Additionally, the XVM supports lazily initialized state as part of an otherwise-immutable data structure, specifically, by means of a function with presumed-idempotent behavior. Such a capability allows the evaluation of time-expensive computations and/or the allocation of space-expensive data structures to be deferred until actually requested. One common example is the hash function calculation for complex data types, which is often assumed to be expensive enough to defer, but whose immutable value, once computed, should be saved for subsequent use.

On References

The family of languages influenced by C++ share an implicit trait: A compile-time knowledge of accessibility. For example, it is the C++ compiler that prevents access to private members, and the compiler that allows access for a friend. Subsequent languages, like Java, built on this model, and carry sufficient information in the compiled form of the language to allow the accessibility implied by the compiled code to be validated for a second time at runtime, to prevent an end-run around the language’s security features. One by-product of this design is the ability to use a single pointer to reference an object, and more specifically, a pointer that is – in C++ parlance – a Vtable** (a pointer to a pointer to a virtual function table).

In the simplest terms^[10], one can imagine an object as the combination of a structure (such as a C struct) and functions that operate on that structure. For each particular type (known as a class), the Vtable** approach arranges those function pointers in an array, and orders those function pointers in the array to be consistent with other compatible types, thus supporting the polymorphic behavior of compatible types (via the common sub-sections of those arrays). A pointer to the array of functions is stored in the first field of the struct, which means that a pointer to the struct will also act as a pointer to the functions associated with that struct (hence the object pointer being a Vtable**). Invocation of any of those functions requires that a pointer to the struct (named this) be passed to the function, thereby allowing the function to access and modify fields in the struct, and allowing it to invoke other functions against that same struct, all without statically binding to either the address of the struct or to the addresses of the other functions related to the struct.

From a mechanical-simplicity and efficiency standpoint, the benefits of this model are difficult to overstate; it is powerful, and it is elegant. However, there are several specific costs to account for as well. First, the type system defines accessibility in a static manner, predicated on the class of the referrer and the class of the referent, and how those two classes relate. Consider C++ protected members, which are accessible to sub-classes, or Java package-private members, which are accessible to any class within the same package (a hierarchical name-space). Second, the facets that a class exposes are statically fixed, such as the set of members declared as public, allowing any referrer that obtains a reference to exploit runtime capabilities such as Java’s reflection to fully examine public members, access any public data fields, and invoke any public methods. While allowing arbitrary access to public members does not seem at first to be a concern, it can be; consider that members of an interface are always public in an implementing class, which leads to the third issue: It is not possible for an object to selectively expose its members, for example providing different interfaces to different referrers.

This lack of selective exposure can lead to complexity, particularly when the goals of rich functionality and security collide. One need look no further than Java’s serialization capabilities, in which an object must be able to expose its state to a serialization mechanism, but that same state – often private in nature – must be protected from all other referrers. The solution to conflicting goals is to create exceptions to the rules. Unfortunately, each exception to a rule contributes to the complexity of the system, and faults (vulnerabilities) inevitably emerge from complexity.

Ecstasy employs a different model entirely, by conceptually composing each reference from two pieces of information: An object’s identity, and the reference type, which is the type or interface exposed through the reference. This approach is a synthesis of several concepts:

A type is defined not by its name nor by its implementation, but solely by its member capabilities, exposed as a set of methods and properties;
A reference encodes both the capabilities that are selectively permitted, and the object identity against which those capabilities may be invoked;
While each class of objects has a number of predefined sets of capabilities, such as public, protected, and private interfaces, a set can also be arbitrarily composed;
The permission to access the capabilities of an object is part of (as if encoded in and defined by) the reference information itself, and is neither the result of static compile-time checks nor dynamic security checks;
It is always permissible to reduce (to mask) the type of a reference by removing properties and/or methods, but the opposite (to reveal) is forbidden, except at or above the point in the container hierarchy where the object’s class was loaded (similar to a "ring model" in a CPU).

There are several effects of this design decision:

Any number of different references can exist for the same underlying object, each of which can have a different set of exposed methods and properties;
There are cases in which it will be possible for the XVM to provide a reference to a purely derivative object of an existing object without actually allocating a derivative object, by instead creating a references whose type is the type of the derivative object, but whose implementation is modified to rely on the original object. For example, the same identity can be used to represent a particular object and an array of objects containing only that particular object, by encoding that same identity with two different types into two different references^[11].

Lastly, the ability to mask and reveal portions of functionality in a reference is a fundamental security mechanism. When a reference from outside of a container is injected into that container, that reference is masked as the injection type (such that only the methods declared on the interface being injected will be accessible), and within that container, it is then impossible to reveal additional methods on that reference. However, within a container, for classes defined within that container, it is always possible to reveal additional methods on references to objects created within that container. In other words, modules loaded within a container are not protected from other modules loaded within the same container, and objects instantiated within a container are not protected from other objects instantiated within the same container. What does this mean in practice? Among other things, it means that within the container that defines a class and instantiates an object of that class, terms like private and protected are useful for hiding complexity, but these terms do not even pretend to provide the false notion of security that has been promulgated in the C++ family of languages.

On Portability

Similar to the goals of the Java Virtual Machine specification, portability is a primary consideration for the XVM. Specifically, the XVM design targets the efficient implementation on three major instruction sets (x86/x64, ARM A32/A64, and wasm^[12]) and five major operating systems (Android, iOS, Linux, macOS, Windows); it is assumed that support for any other modern environment is practical as a result.

Instruction Set	Operating System	Example
ARM	Android	Phone, tablet, notebook, kiosk, embedded device
ARM	iOS	iPhone, iPad, Apple TV
x86/x64	Linux, macOS, Windows	Server, desktop, notebook, tablet
wasm	*	Browser

The XVM specification defines a portable binary (aka byte code), and the behavioral requirements to execute such a binary. The XVM is not constructed around nor constrained by a particular word size, pointer size, endian byte ordering, or signed/unsigned support of the host CPU; all encodings in the portable binary are either octet (8-bit byte) or variable-length integers.

The portable binary was designed as an input for native compilation, such as a Just In Time (JIT) compiler. Additionally, care was taken in the design of the XVM to ensure that it allowed for Ahead Of Time (AOT) compilation. One explicit non-requirement of the portable binary is efficient interpretation; while an XVM interpreter is possible (one was implemented as a proof-of-concept during development of the XVM specification), the design of the portable format does not lend itself to efficient interpretation.

There is one root^[13] module, the Ecstasy.xtclang.org module, which provides the fundamental building blocks of the type system; all XVM types are derived from the contents of this root module. Furthermore, a module cannot contain native (external) code, so the root module contains a complete implementation of the types contained therein. This implies a completely self-referencing type system composition, which while a correct assumption, is also an impossible conclusion^[14].

While the root module itself is identical (portable) across hardware and software platforms, the implementation of the XVM varies – potentially dramatically – across the same. The XVM is responsible for replacing select portions of the self-referential implementation in the root module with native implementations that support the specified contracts; however, which portions get replaced will vary depending on a combination of the goals of a particular XVM implementation and the capabilities of the hardware and software combination that the XVM is targeting. Furthermore, which portions get replaced is (and should be) unknown by a programmer working in Ecstasy.

On one hand, it is highly desirable to be able to provide a truly minimal set of native implementations in the execution system itself, and to be able to implement all other data types and operations by composing from that minimal set. This approach permits a minimalist implementation of the XVM. Similarly, it is highly desirable to be able to provide a richer set of native implementations as part of the execution system itself, in order to leverage additional capabilities provided by the runtime environment, the operating system, and the underlying hardware. In other words, the ability to trade off complexity, size, and performance is based on the ability to define standardized capabilities as abstract data types that can be implemented either as a native part of the execution system or in software that is executed by the execution system. While this design aspect is visible in as few places as possible, it deeply permeates the design of the XVM specification and the root module.

To accomplish this goal, the root module provides a rich set of abstract types, defined as part of the XVM specification, and a reference implementation for all aspects of those types is provided. It is possible to create one XVM implementation that relies almost entirely on the reference implementation of these types (thus requiring only a nominal set of operations to be implemented in native code), and another XVM implementation that directly implements a dramatically larger portion of the root module in native code (hopefully achieving higher performance with less memory usage).

With the advent of the CLI specification, best known in the Microsoft .NET CLR incarnation, and to a lesser extent in the open source Mono project, a new facet was added to the concept of virtual machine portability: Explicit multiple-language support. There are inevitable trade-offs in creating an execution system that is intended to support multiple languages; generally, an execution system is optimized for a specific language, and the implementation of any other language trades off between execution performance and the adherence to the “foreign” language specification.

One can conceivably implement any programming language on any Turing-complete execution system, resorting to full runtime interpretation if necessary; the obvious trade-off is correctness versus space/time efficiency. With respect to multiple language support, the primary goal of the XVM is to enable the efficient implementation of a specific class of language that can be described by its attributes: modular, object-oriented, composable, single-dispatch, immutable message passing, concurrent, thread-safe, with automatic memory management. Alternatively, the class of language could be described as an incremental evolution of the Java and C# languages, with support for type composition, a reified generic type system with transitive closure, a single (object) type system, and a recursively consistent (hierarchical) runtime model, while explicitly eschewing support for concurrent access to mutable data. While it is desirable that other classes of languages be efficiently implementable on the XVM, it is an explicit non-requirement that the semantics of those languages be intrinsically supported by the XVM where those semantics differ from those of the targeted specific class of language. Or, more succinctly: The XVM is designed to run the Ecstasy language, and its design allows for relatively efficient execution of languages with similar runtime models, but its design contains nothing that explicitly supports languages other than Ecstasy.

On Startup Time

One of the lesser design goals for the XVM is to be able to achieve near instantaneous startup time. As simple as this goal sounds, the complexity of the type system works against it.

Consider a side effect of the core JVM type system being largely implemented on top of the JVM: The large number of classes required for the simplest “hello world” application. Just as in the children’s song^[15] that iterates over the body a bit at a time, like “the knee bone’s connected to the shin bone” and so on, the JVM is forced to load a relatively large graph of its type system as a result of the dependencies among its most basic types. Loading a type likely requires I/O, parsing, validation, internal type definition, the execution of any specific type initialization code, and the recursive loading of types on which the loaded type depends. While loading a relatively large graph of a type system is acceptable for a long-running process (which can amortize that loading cost over hours, days or months), it is far more intrusive a cost for a short-running process.

The XVM design further exacerbates this condition by guaranteeing – with transitive closure – the validation of an entire type system before it is activated within a runtime container. That means that the entire type system must be loaded and evaluated before the first line of code can be executed.

To address this challenge, several aspects of the XVM design are relevant:

Within a given container, the type system of that container is immutable; loading the type system within a container thus always results in an identical type system;
The elimination of global data structures (and in particular, mutable data structures) dramatically simplifies coupling across the type system, simplifying initialization;
Dependency injection largely eliminates lazily-initialized language sub-systems, such as I/O;
AOT compilation allows a container’s type system, or even an entire application, to be compiled to executable code and any related structures in advance; and
Explicit version support in the portable binary allows the runtime to detect changes that violate any of the above assumptions.

The design is intended to allow an XVM implementation to instantiate a new container with a specified set of modules, with a cost that is on the same order of magnitude as loading and initializing a shared library as part of an already-running C/C++ application.

[1]

XVM is an abbreviation for the XTC Virtual Machine ↩

[2]

Such contracts are software legacy, meaning that the contracts cannot be altered without disclaiming the past; in other words, altering the contracts will break everything. ↩

[6]

The law of conservation of mass and energy, for example. ↩

[7]

Since this was penned, Azul Systems (azul.com) commercialized a true, non-blocking software solution for large shared-mutable stores in their Java Virtual Machine implementation. ↩

[8]

For example, an asynchronous copy-compacting collector allows multiple physical copies to represent the same object. Each individual service can perform its own mark phase, followed by a copy-compact phase (which can be performed by any thread, including a daemon), and any memory that is no longer used can be freed by the next mark phase. ↩

[9]

See http:https://hitchhikers.wikia.com/wiki/42 ↩

[10]

This example is intended to be illustrative, and should not be viewed as an authoritative explanation of object orientation unless you are attempting to annoy a Smalltalk programmer. ↩

[11]

In this case, the term type is being used to indicate both the interface type and the implementation class. ↩

[12]

WebAssembly is a portable machine code for web browsers: http:https://webassembly.org/ ↩

[13]

“Root” in the sense that it has no dependency on any other module. ↩

[14]

Without expanding into a formal proof, the XVM itself provides no data types (such as primitive types) from which new types can be composed; thus, the root module defines types composed only from other types in the same module, which is by definition infinitely recursive. ↩

[15]

This specification would lack provable transitive closure without a link to Dem Bones: https://en.wikipedia.org/wiki/Dem_Bones ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly