Frequently Asked Questions about C--
What machines does the compiler support?
As of May 2004, only
the Pentium back end is as expressive as a C compiler.
The Alpha and Mips back ends can
run hello, world.
We are working on back ends for PowerPC (Mac OS X),
ARM, and IA-64.
Let us know what other platforms you are interested in.
How is C-- different from JVM or CLR?
A major purpose of C-- is to give you the power to make all the
same choices about performance tradeoffs that you would get to make
if you were building a custom code generator.
This attitude distinguishes
C-- from the Java Virtual Machine or Microsoft's Common Language
Runtime, which pre-package JIT
compilation, type checking, class loading, garbage collection,
exception dispatch, and much more besides. If the JVM is a mansion
(great if you like its design), then C-- is the bricks (from which you
can build all sorts of houses).
Why doesn't the system include a garbage collector?
A garbage collector fixes many design choices, such
as heap object layout and relocatability, allocation strategy, real-time
bounds and so on. If we made these choices for you, we would be
guaranteed to make the wrong ones!
Instead C-- provides the hooks you can use to attach your
own
garbage collector. You write your garbage collector in C (a
programming language), not C-- (an assembly language). At some point
in your collector you probably need to traverse the stack to find the
live roots; you use the C interface to the C-- runtime system to do
that.
Relationship to C
Why isn't C-- a superset of C?
- C is a programming language designed for
human programmers, whereas C-- is a compiler
target language. So C has many things that are
entirely unnecessary for C--; requiring a
C-- compiler to support them would be exceptionally
burdensome. Most notably, C has an elaborate type system
that we don't need for code generation (struct, union,
prototype, and all that).
- There are a few features in C that are actually
incompatible with C--, most notably support for
varargs procedures (that is, procedures with a variable
number of arguments). It seems impossible to have both
C-style varargs and efficient, fully-general tail calls (which
C-- must have).
- C-- deliberately provides different notation for
many things that C can do. For example, where C would have
"*p", we write "bits32[p]" in
C--.
- The C standard leaves too much up to the
implementation, including the representations of structures,
the sizes of the built-in types, and the meanings of the
operators. C operators can have side effects, and the
order of their evaluation is unspecified. Implementing other
languages on top of C-- requires finer control. For
example, Modula-3 requires division that rounds towards minus
infinity, not towards zero. Standard ML requires
arithmetic operations that detect overflow.
OK, then why isn't C-- a subset of C?
For efficient compilation of modern languages, we need features
that C just doesn't provide efficiently:
- Ability to return multiple values in registers
- Optimized tail calls to any procedure
- Global variables bound to registers
- Ways to tie garbage-collection information to particular
program points
- Support for exceptions
- Support for lightweight concurrency
The latter three are the real killers: C-- provides a
run-time interface that allows the state of a suspended
C-- computation to be inspected and modified at runtime. C
has no equivalent for this.
The goal of UNCOL was to provide a universal intermediate language.
The goal of C-- is more modest: to encapsulate what code generators
already do well.
We don't claim you could do any more with C-- than you could do with a standard
code generator; we're just trying to make it easier to do those
things.
A key distinguishing feature of C-- is the data model.
Put simply, C-- has no high-level types---it does not even
distinguish floating-point variables from integer variables.
This model gives the front end total control of representation and
type system, which is quite different from an UNCOL.
This data model also helps distinguish C-- from systems like the
Java Virtual Machine and the Microsoft Common Language Runtime.
Why is the syntax of C-- so C-like?
We expect that compiler writers will have to read a lot of C-- while
they're debugging their front ends. Many compiler writers
have significant experience reading low-level C code; making the
syntax C-like helps them benefit from this experience.
There are a number of syntactic tweaks in C-- that make it easier to
generate than C; for example, every operator has a prefix form,
so it's not necessary to use infix operators.
Why is the domain name not c--.org?
The Evil Empire
refuses even to consider registering a domain name that ends
with a dash.
Where is the Source Code?
Get it from our rsync server or try the
C-- Downloads page.
What about that other C--?
C-- is such a good name that others have also used it.
Is this a secret Microsoft project?
Simon Peyton Jones does work for Microsoft Research, but
C-- is not a Microsoft project. C-- is open source;
the code is available to everyone.
Does C-- provide the %xyz primitive?
Eventually we will divide the primitive operations into two
categories:
- Required operations, which must be supported by every
implementation, at at least one size.
- Optional operations, which if supported, must have the
standard semantics.
The picture about sizes is less clear. For example,
although we expect to require every implementation to implement
two's-complement add, we can't imagine requiring every
implementation to implement 32-bit two's-complement add.
The reason for adding all known operations to C-- as
standard opcodes is to encourage different implementations to
use the same name for the same operation. Eventually we hope to
have a register a new primitive operator page at
cminusminus.org.
C-- has no primitive operators that return multiple
results. But my target machine has a single instruction that performs both
quotient and remainder. What do I do?
Ideally, the C-- compiler would
spot separate %quot and %rem operations and combine
them. But you might want to use a multiple
assignment to communicate your intentions to the code generator, e.g.,
q, r = %quot(x, y), %rem(x, y);
We don't have enough peephole optimization yet to know
if this style will make a difference, but it can't hurt.
Contact: C-- Webmaster.
URL: https://www.cminusminus.org/.
Last edited: Mon 05 Feb 2007 14:02 EST.