Hacker News new | past | comments | ask | show | jobs | submit login
Writing a Rust compiler in C (notgull.net)
393 points by todsacerdoti 75 days ago | hide | past | favorite | 223 comments



If I were to try bootstrapping rust, I think I would write a proto-rust in C that has fewer features than full rust, and then write a full rust compiler in proto-rust.

‘proto-rust’ might, for example, not have a borrow checker, may have limited or no macro support, may never free memory (freeing memory isn’t strictly needed in a compiler whose only goal in life is to compile a better compiler), and definitely need not create good code.

That proto-rust would basically be C with rust syntax, but for rust aficionados, I think that’s better than writing a rust compiler in “C with C syntax” that this project aims for.

Anybody know why this path wasn’t taken?


FWIW mrustc, the existing state of the art non-rust rust compiler, already doesn’t have a borrow checker.

Removing the borrow checker doesn’t break any correct programs — it just makes it so a huge amount of incorrect programs can be compiled. This is fine, since we mainly want to use mrustc to compile rustc, and we already know rustc can compile itself with no borrow checker errors.


And once you have yourself bootstrapped, you can presumably turn around and compile the compiler again, now with borrow-checking and optimizations.

In the very special case of proto-rust bootstrapping, the cost of not having borrow-checking can be paid back basically right away.


"Removing the borrow checker doesn't break any correct programs - it just makes it so a huge amount of incorrect programs can be compiled."

Not a user of Rust programs myself but am curious how users determine whether a Rust binary was compiled with mrustc or rustc.


You can assume that unless you have some specific information to the contrary, any Rust binary you encounter in real life was built with rustc. mrustc is not used for any mainstream purpose other than in the bootstrap chain of rustc in distros like Guix that care about reproducibility, and even then, the build of rustc they distribute to users will be re-built from the bootstrapped rustc, it won’t be one compiled by mrustc directly.


Hypothetical: A computer program is distributed in binary form with the description "written in Rust" or some similar indication it is a Rust binary. A computer user who neither compiles their own programs nor reads source code downloads and runs the program under the belief that because the program is "written in Rust" it offers memory safety. Unbeknownst to the computer user, the program has been compiled without memory safety.


That belief would be erroneous regardless of what compiler you used, because rust unsafe lets you do anything, including cause memory unsafety and other UB, even when using rustc.

Rust isn’t any kind of guarantee for end-users that some class of bugs doesn’t exist. It’s just a tool for programmers to make writing a certain class of bugs more difficult. If the programmer chooses to subvert the tool, they can.

(Also, I think most reasonable people would feel you were lying if you referred to a program that can’t be compiled by rustc as “written in rust”. Maybe “written in a dialect of rust” would be more accurate. Rust isn’t like C where there is a standard that multiple competing implementations target; the definition of rust is "what rustc accepts".)


If you want certainty you'd use Reproducible Builds and know for sure which compiler generates this binary from the given source code.

This assumes the code is open source, you know which specific source code release was used and the author of the binary is making an effort to document the build environment that was used, e.g. through an attached SBOM.


It's an interesting point of flex here: Your compiler doesn't have to be feature complete, it just has to be able to build a more feature complete binary.


It is interesting. A subset of functionally (in this case) will build a superset of programs


This is what we did for Mozart/Oz [1]. We have a compiler for "proto-Oz" written in Scala. We use it to compile the real compiler, which is written in Oz. Since the Scala compiler produces inefficient code, we then recompile the real compiler with itself. This way we finally have an efficient real compiler producing good code. This is all part of the standard build of the language.

[1] https://github.com/mozart/mozart2


Is it possible to bootstrap Scala?


why limit yourself to Scala? bootstrap everything! [0]

[0] https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...


So now you're writing two compilers.

What did you actually gain from this, outside of more work?


Writing a small compiler in C and a big compiler in Rust is simpler than writing a big compiler in C.


But writing a Rust compiler in Rust is already done.


Sure, but the small compiler that you write in C can't compile rustc. So you write a new Rust compiler that uses much simpler Rust that the small compiler in C can compile. Then that new Rust compiler can compile rustc.

And since that new Rust compiler might not have much of an optimizer (if it even has one at all), then you recompile rustc with your just-compiled rustc.


No that makes no sense to me. Or are we pretending cross-compilation doesn't exist?


Sometimes people like to do things just do do them, because the idea is cool. Sometimes an idea is cool because it has real world ramifications just for existing (trusting trust, supply chain attacks), though many people like to argue about whether or not this particular idea is Just Cool TM or Actually Useful TM. I don’t think the article made any false pretenses either way- it seemed evident to me that at least some of the motivation was “because how? I want to do the how!” And that’s cool! Also, I think the article pretty clearly redirects your question. Duh he could just cross compile rust. But the whole article, literally the entirety of it, is dedicated to exploring the chicken/egg problem of “if rustc is written in rust, what compiles rustc?”, and the rabbithole of similar questions that arise for other languages. The answer to that question being both “rustc of course”, and also “another compiler written in some other language.” The author wants to explore the process of making the compiler in [the/an] ‘other language’ because it’s Cool.

Just because something has already been done, does not mean there’s no worth in doing it- nor that there is only worth in doing it for educational and/or experiential purposes :)


Cross-compilation is orthogonal to bootstrapping, which is the major motivating factor for having something like what is described earlier in this thread: a small compiler written in C which can compile a subset of Rust, used to compile a larger, feature complete Rust compiler in that Rust subset -- versus what we have right now, which is a Rust compiler written in Rust which requires an existing Rust compiler binary, which means we have an unmitigated supply chain attack vector.

If you change your question to "why does anyone care about bootstrapping?", the answer would revolve around that aforementioned supply chain attack vector.

For details, you could check out:

- Reading the 1984 paper "Reflections on Trusting Trust", for which Ken Thompson was given the ACM Turing award: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...

- Or watching this short Computerphile video on the above: https://www.youtube.com/watch?v=SJ7lOus1FzQ

- You can read about the GNU Guix endeavor to achieve a 100% bootstrapped Linux development environment (using zero pre-compiled binaries), starting from a 357-byte HEX file: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

Perhaps you're comfortable with the lack of assurances that non-boostrapable builds entails (everyone has a different appetite for risk); some others aren't though, and so they have an interest in efforts to mitigate the security risks inherent in trusting a supply chain of opaque binaries.


The post is about solving a specific same-architecture bootstrapping problem. Cross-compilation is irrelevant to this discussion.


Yes, they are because they want to target systems which explicitely disallow cross-compilation like Debian.

Yes, I think it's silly too but other people disagree and they are free to work on whatever they want. Do I think it's a mostly pointless waste of time? Obviously, I do. Still, I guess there are worst ones.

Note that the Rust project does use cross-compilation for the ports it supports itself and considering the amount of time they use features only available in the current version of rustc in rustc, I guess it's safe to assume they share my opinion on the usefulness of keeping Rust bootstrappable.


There are two kinds of bootstrapping:

* Bootstrapping a language on a new architecture using an existing architecture. With modern compilers this is usually done using cross compilation * Bootstrapping a language on an architecture _without_ using another architecture

The latter is mostly done for theoretical purposes like reproducibility like reflections on trusting trust


Paring down a rust compiler in rust to only use a subset of rust features might not be a big lift. Then you only need to build a rust compiler (in C) that implements the features used by the pared-down rust compiler rather than the full language.

Pypy, for instance, implements RPython, which is a valid subset of Python. The compiler is written in RPython. The compiler code is limited on features, but it only needs to implement what RPython includes.


How do you compile that on a new platform?


Cross-compilation. There is no requirement of being able to run the compiler on the platform to compile for that platform.

It is much easier to add support for a platform to the compiler backend than to write a new, full compiler with its own, new bootstrapping method.


One way would be to have an intermediate target that is easily recompiled or run on any hardware.

https://ziglang.org/news/goodbye-cpp/


But that doesn't conform to the "Descent Principle" described in the article.

I haven't really been following Zig, but I still felt slightly disappointed when I learnt that they were just replacing a source-based bootstrapping compiler with a binary blob that someone generated and added to the source tree.

The thing that makes me uncomfortable with that approach is that if a certain kind of bug (or virus! [0]) is found in the compiler, it's possible that you have to fix the bug in multiple versions to rebootstrap, in case the bug (or virus!) manages to persist itself into the compilation output. The Dozer article talks about the eventual goal of removing all generated files from the rustc source tree, ie undoing what Zig recently decided to do.

If everything is reliably built from source, you can just fix any bugs by editing the current source files.

[0] https://wiki.c2.com/?TheKenThompsonHack


I think there is too much mysticism here in believing that the bootstrapping phases will offer any particular guarantees. Without essentially a formal proof that the output of the compiler is what you expect, you will have to manually inspect every aspect of every output phase of any bootstrapping process.

OK, so you decide to use Compcert C. You now have a proof that your object code is what your C code asked for. Do you have a formal proof of your C code? Have you proved that you have not allowed any surprises? If not, what is your Rust compiler? Junk piled on top of junk, from this standpoint.

On the other hand, you could have a verified WASM (or other VM) runner. That verified runner could run the output of a formally verified compiler (which Rustc is not). The trusted base is actually quite small if you had a fully specified language with a verified compiler. But you have to start with that trusted base, and something like a compiler written in C is not really enough to get you there.

Oh, and why do we trust QBE?


> Without essentially a formal proof that the output of the compiler is what you expect, you will have to manually inspect every aspect of every output phase of any bootstrapping process.

And why would it be easier to manually inspect (prove correct) the output of every phase than to manually inspect (prove correct) the source code? The compiled code will often lose important information about code structure, how abstractions are used, include optimisations, etc.

I usually trust my ability to understand source code better than my ability to understand the compiled code.


But you cannot trust the compiler, you said.


That's not what I said. I've implied that it's hard to trust the output of some unknown compiler (eg, the "zig1.wasm" blob) and that it's easier to trust source code.

The Dozer article explains, under "The Descent Principle", how rustc will eventually be buildable using only source code [0] (other than a "512-byte binary seed" which implements a trivial hex interpreter). You still need to trust a computer to run everything on, though in theory it should be possible to gain trust by running it on multiple computers and checking that the result is the same (this is why any useful system bootstrapping project should also be reproducible [1]).

[0] https://github.com/fosslinux/live-bootstrap

[1] https://bootstrappable.org/best-practices.html


The even more immediate objection is that a binary blob is the opposite of portable?!


In this case it is portable, because the Zig compiler source tree includes an interpreter for the blob (WASM) in portable C.

It's not objectionable to have non-portable source code anyway. I think it's fine having architecture-specific assembly code, just as long as it's hand-written.

The problems arise when you're storing generated content in the source repository, because it becomes unclear how you're meant to understand and fix the generated content. In this case it seems like the way to fix it is by rerunning the compiler, but if running the compiler involves running this incorrect blob, it's not clear that running the compiler again will produce a correct blob.

I wonder if anyone is monitoring these commits in Zig to ensure that the blobs are actually generated genuinely, since if not it seems like an easy way for someone to inject a KTH (Ken Thompson Hack): https://github.com/ziglang/zig/commits/master/stage1/zig1.wa...


Cross compilation.


Writing programs in Rust is not simpler then writing programs in C.


For compilers specifically, I think plenty of people would disagree.

It's not that it's exceedingly hard in C, but programming languages have evolved in the last millenium, and there are indeed language features that make writing compilers easier than it used to be

I have the most fun when I write x86 MASM assembly. It's a pretty simple language all in all, even with the macro system. Much simpler than C.

But a simple language doesn't always make it simple to write complex programs like compilers.


It is really remarkably sucky to process trees without algebraic datatypes and full pattern matching. Most of your options for that are ML progeny, and the rest are mostly Lisps with a pattern-matching macro. While it’s definitely possible to implement, say, unification in C, I wouldn’t want to—and I happen to actually like C.

Given the task is to bootstrap Rust, a Rust subset is a reasonable and pragmatic choice if not literally the only one (Mes, a Lisp, could also work and is already part of the bootstrappable ecosystem).


Sure, for you it isn't. It is for me. Especially if we're talking "working roughly as intended" programs.


Rust feels impossible to use until you "get" it. It eventually changes from fighting the borrow checker to a disbelief how you used to write programs without the assurances it gives.

And once you get past fighting the borrow checker it's a very productive language, with the standard containers and iterators you can get a lot done with high level code that looks more like Python than C.


I agree but it's not different than C with a decent library of data structures. And even when you become more borrow checker aware and able to anticipate most of the issues, still there are cases where the solution is either non obvious or requires doing things in indirect ways compared to C or C++.


The quality difference between generics and proc macros vs the hoops C jumps through instead is pretty significant. The way you solve this in C is also unobvious, but doesn't seem like it when you have a lot of C experience.

I've been programming in C for 20 years, and didn't realize how much of using it productively wasn't a skilful craft, but busywork that doesn't need to exist.

This may sound harsh, but sensitivity to order definition, and the fragility of headers combined with a global namespace is just a waste of time. These aren't problems worth caring about.

Every function having its own idea of error handling is also nuts. Having to be diligent about error checking and cleanup is not a point of pride, but a compiler deficiency.

Maintenance of build scripts is not only an unnecessary effort, but it makes everything downstream of them worse. I can literally not have build scripts at all, and be able to work on projects bigger than ever. I can open a large project, with an outrageous number of dependencies, and have it build on the first try, integrate with IDEs, generate API docs, run unit tests out of the box. Usually works on Windows too, because the POSIX vs Windows schism can be fixed with a good standard library and cross-platform dependency management.

Multi-threading can be the default standard for every function (automatically verified through the entire call graph including 3rd party code), and not an adventurous novelty.


Writing non-trivial programs is easier in Rust than in C, for people that are equally proficient in C as in Rust. Especially if you're allowed to use Cargo and the Rust crates ecosystem.

C isn't even in the same league as Rust when it comes to productivity – again, if you're equally proficient in Rust as in C.


I have 40 years of C muscle memory and it took me many tries and a real investment to get into Rust, but I don’t do any C anymore (even for maintenance- I’d rather rewrite it in Rust first).

Rust isn’t in a difference class from C, it’s a different universe!


This does not match my experience.


Try putting everything in Arc<Mutex<>> or allow mutable_transmutes and things get rather comfy.


Doesn't this defeat the point of using Rust a bit?


You have to consider that those who write the Rust compiler are experts in Rust, but not necessarily experts in C. So even if writing programs in C may be simpler than in writing programs in Rust for some developers, the opposite is more likely in this case, even before we compare the merits of the respective languages.


This is 100% the case. All of the honest-to-god Rust experts I know work on the compiler in some way. Same goes for Lean, which bootstraps from C as well.


Writing programs that compile is much easier in C. It lets me accidentally do all sorts of ill-advised things that the Rust compiler will correctly yell at me about.

I don't remember it being any easier to write C that passes through a static analyzer like Coverity etc. than it is to write Rust. Think of rustc like a built-in static analyzer that won't let you ignore it. Sometimes that means it's harder to sneak bad ideas past the compiler.


This is probably true if you assume it doesn't matter whether the program is correct.


Yes it is, why would anyone use it otherwise?


You can now have trustworthy Rust compiler binaries, through the work of the Bootstrappable Builds community, which found a way to build a C compiler without having C compiler binaries yet.

https://bootstrappable.org/ https://github.com/fosslinux/live-bootstrap/


Two simpler pieces of work as opposed to one complex one. Even if the two parts might be more volume, they're both easier to write and debug.


You often write two compilers when trying to bootstrap a C compiler, as GCC used to do. Often, it's a very simple version of the language implemented in the architecture's assembly.


Even if it is a bit more work:

- you can write the bulk of your code in a language you prefer over C

- you end up with a self-hosting rust compiler


Just for the lulz I'm writing a C compiler in Rust as a hobby, and it is humorously called "Small C Compiler", a call back to "Tiny C Compiler" because Rust is obviously more heavyweight than C.

It uses Cranelift as a back end, but the whole compiler architecture is pluggable and hackable with lots of traits throwing around. I do not intend to open source it unless it works on a somewhat functional stage to be able to handle printf("%s", "Hello World!"), so until then, it will never see the light of day.

I've not been able to make too much progress, but I've tried to implement the preprocessor and parser, and I have been involved on rust-peg and HimeCC because of the infamous typedef problem. I know that in the industry we just use a symbol table to keep the typedef context, but that had a limitation of not able to read types below. I wonder what is the academic solution to that as well, and I can only think of transactional memory.

Anything that helps would eventually make me open source it!


FWIW, (i.e., for some historical fun) Dr. Dobbs Journal published a program called "Small C Compiler" by Ron Cain back in 1980. [1]

Later, it was expanded by James Hendrix into a full book with a more complete implementation. [2] (As a kid, coming across this book in the bargain bin at CompUSA was what led to me learning C. I still have my copy!)

[1] https://archive.org/details/dr_dobbs_journal_vol_05_201803/p...

[2] https://www.amazon.com/Small-Compiler-Language-Theory-Design...


That was my first introduction to C and I hacked a lot on that code. A very enjoyable time was had.

My only regret with Rust is that a “Small Rust Compiler” will be an order of magnitude larger.


Hope you name it not “SCC” but “SmaCC”.


This is super cool but what's interesting is that this same kind of bootstrapping problem exists for hardware as well. What makes computers? Previously built computers and software running on them. The whole thing is really interesting to think about.


The same bootstrapping problem exists for everything. What makes roads? Construction equipment. How do you get that construction equipment to the job site, without a road already being there?

I actually met a person a few months ago who worked for a startup doing delivery/fulfillment of materials for construction projects. They pointed out that this requires special expertise beyond, say, Amazon, not only because these materials tend to have unusual and/or dangerous physical properties, but also because the delivery addresses tend to be... well, they tend not to have addresses yet! This is all solvable (apparently), but only with expertise beyond the usual for delivery companies in the modern age.


fascinating! I suppose our normal modern systems aren't equipped to handle descriptive addresses - "take a right after the foo store and then go to the end of the road and give the equipment to the people at the end of the road so they can make more road"


I used to work at a company that built data centers. They were trying to get their software to appoint that you could turn up an entire data center from a laptop. Why? So that you could work with European companies and prove to regulators that there were no backdoors. It was a fascinating problem but very difficult. My team was only tangentially involved but we did some work to forward our data to a proxy that ensured that all our data was auditable and not sending stuff it shouldn't. I left before it finished but I heard it was scrapped as too difficult.


Anecdotally I've used software that was capable of it if your hardware could be netbooted, preferably with pxe/ipxe. I used rackn and there's other vendors like maas with purportedly the same abilities.

RackN is good enough it'll let you build virtualization on top of bare metal and then keep going up the stack: building VMs, kubernetes, whatever. You just set up rules for pools, turn on dhcp and let auto discovered equipment take on roles based on the rules you set. Easy to do although I wouldn't envy anyone building a competitor from scratch.


There's a few prerequisites that make this all very realistic if the time is put in

* An LTE remote access box connected to a few switches management ports so you can configure the switches yourself

* Ensuring that the vendors pre-cable the racks and provide port-mapping data

* Ensuring that the vendors set the machines to PXE boot

* Ensuring the vendors double-check the list of MAC addresses of the HW provided for both in-band and oob


This is something like what Oxide Computer is trying to do, but of course you have to use their hardware to benefit.

https://oxide.computer/blog/the-cloud-computer


Then you look at the assembly for the old Cray-1 computers (octal opcodes) and the IBM System/360 computers (word opcodes), and you realize, they made it so amazingly simple you can mostly just write the opcode bytes and assemble by hand if you like.

Then x86 came along, without the giant budgets or the big purchasers, and so they made that assembly as efficient and densely packed as is possible; unfortunately, you lose what you might otherwise conveniently have on other machines.


I've read somewhere that Seymour Cray used to write his entire operating system in absolute octal. ("Absolute" means no relocation; all memory accesses and jumps must be hand-targeted to the correct address, as they would have to be with no assembler involved.)


x86 is the same if you stick to the origional 4bit subset. However it has been extended so many times that you can't find the nice parts.


This is one of the coolest things about these kinds of bootstrapping projects + reproducible builds IMO. One could imagine creating an incredibly simple computer directly out of discrete components. It would be big, inefficient and slow as molasses, but it could in theory conform to instruction set architecture, and you could use it to build these bootstrap programs, and you could then assert that you get the same result on your fully-understood bad computer as you get on not-fully-trusted modern hardware.


Interesting to think about even at a human civilization level. What if humans somehow went back to the Stone Age, but in present day. Could we build back to what we have now?

Kind of a bootstrapping problem. For example, current oil reserves are harder to get than they were a century ago. Could we bootstrap our way into getting them?


I’m not sure the actual problem would be bootstrapping, i think the main problems (not sure in which order) would be: discovery (how do you know who has the necessary skill?), logistics (how do we get all the people in the same place for them to work together and how do we extract and transport the necessary resources in such place?) and ultimately time (how do we do a minimal technological bootstrap before the people currently holding knowledge die before of old age?).


Would we want to build the same stuff again? Why bootstrap to oil if you can directly go for renewable alternatives?


But we used simpler forms like oil to bootstrap renewables. For instance, making solar panels takes lots of energy. Would it be possible to go straight to renewables?


Wind power existed for hundreds of years before we started drilling for oil. I doubt you can make useful solar cells, but you can make useful windmills, rechargeable batteries, light bulbs (incandescent), and motors.

However just the above list needs a large list of industry to pull off. Can you make a wire? What about a ball bearing - they are made by the millions of insane levels of precision and are cheap. All those little details are why you can't pull it off. Sure if given all the parts you can pull off the next step, but there are so many steps you can't do it.


I've thought a lot about these problems and you eventually hit the need for stronger than natural magnets. Without electricity it's a hard challenge, but without magnets creating electricity in a simple bench scale is a lot harder.

I ended up thinking that you'd need to do a chemical battery to bootstrap electricity and then with electricity generate the electromagnet to create stronger magnets and then iterate from there.

Your next stumbling block from there would be optics as everything else can be made with horrible tolerances. Even lathes and similar machinery can be made with pretty good tolerances without optics. But when you start needing time keeping or miniaturizing components for improved efficiencies, it becomes a blocking issue.

You also need to discover photo-reactive elements to do lithography, but that's a lot easier since it's just silver nitrate and you'd already have the components when you are working towards the initiate bootstrap battery.


would you need to rediscover the table of elements and atomic theory in your version of things? There's a lot of a scientific learning we take for granted that is actually important when building a new civilization from scratch.


How far can you get - there is a lot I know how to do but won't have time to create before I die


If you include Coppicing for charcoal and building wood, along with modern knowledge, it should be possible to go straight to wind power and rush solar.


Lithography masks for early integrated circuits were drawn by hand iirc.


Not just ICs, microprocessors:

https://en.wikipedia.org/wiki/Rubylith


> Certain digital image editing programs that have masking features may use a red overlay to designate masked areas, mimicking the use of actual Rubylith film.

Oh so that's why masking mode in Photoshop is a red overlay?! TIL.


And who makes people?


Storks


Crass joke time!

Little Timmy came into his parents room one afternoon and said "mommy, daddy, where do babies come from?"

His parents were surprised, he's a little young for that, so that sat him down and explain gently "when two people love each other very much, sometimes, a stork flies in carrying a baby wrapped in blankets in it's bill, and it leaves the baby on the new parents doorstep!"

Little Timmy scrunches up his face, confused, then asks "well then who fucks the stork?"


Which came first, the computers or the code?


Ada Lovelace is often credited as the first computer programmer. She died in the late 1800s. Programmable electronic computers didn't come along until the mid 1900s.

Though it obviously depends a bit on what you are willing to count as computer, or as code.


We all know why the Lovelace myth still persists http://projects.exeter.ac.uk/babbage/ada.html "It is often suggested that Ada was the world's first programmer. This is nonsense: Babbage was, if programmer is the right term. After Babbage came a mathematical assistant of his, Babbage's eldest son, Herschel, and possibly Babbage's two younger sons. Ada was probably the fourth, fifth or six person to write the programmes. Moreover all she did was rework some calculations Babbage had carried out years earlier. Ada's calculations were student exercises. Ada Lovelace figures in the history of the Calculating Engines as Babbage's interpretress"


Can't we let women have this one thing? Like... just this one thing? It's fine. Who knows, a lot of time has passed and I'm sure there's many people who "programmed" and never told anyone.

It's fine, let Ada have this. It's dead anyway and we clearly don't have nearly enough women in Computer Science so we can let this one go. We already have 99% of all stuff, we shouldn't get greedy.


Instead of patronizingly giving women a false hero, instead introduce them to a real one: Klara Von Neumann (yes, that Von Neumann) who was the first coder for what we might recognize as a digital computer in the modern sense. She had to pioneer a lot of stuff!


that's extremely patronizing to women


Says who? You, all women?

Ada Lovelace is a real person. What's patronizing about it? We didn't make her up out of pity for women.


(The code, of course; the code drove music boxes and looms centuries before computers. Same for chicken and egg: eggs are maybe a billion years older.)


so..the code drove computers


Correct. And the chicken was written in COBOL.


Probably off-topic, but the chicken and the egg "paradox" always seemed silly to me in the context of evolution. We know that there were birds long before chickens, so at some point, the first bird that we would consider to be in the species "chicken" had to hatch from an egg from a bird that was _not_ a chicken, so the egg came first. (This assumes that the question is specifically about chicken eggs; it's even simpler if you count non-chicken eggs from the ancestors of the first chicken, but the logic still works even if you don't).


There is no paradox, because there was never a non-chicken parent which was so different that we could consider the newborn chicken a new species. It takes thousands of generations to say such things, not one.


You have the draw the line _somewhere_ though, right? If not biologically, at least linguistically we don't call other birds chickens, and we don't call other animals with shared common ancestors chickens, and I don't think that you can argue that the common ancestors of chickens and, say, primates, can be referred to as both "human" without being prescriptivist to the point that you'd be dictating rules that essentially zero English speakers actually follow.

To be clear, I don't disagree with you that my argument makes little sense biologically; my point is that the question itself is phrased in a way that doesn't really parse correctly in a scientific sense. To me, it reads more like a semantics question (i.e. it depends on your definition of "chicken" and "egg) because the only way to get a scientifically precise answer is to expand the definition of "chicken" beyond recognition.


Yes, we need to draw a line and the question seems flawed. Sure, it makes you think, it is funny, but it contains an invalid assumption that the line's width is a single generation. If we assume that the question is about species, it takes thousands of generations for an offspring not to be able to reproduce with its ancestors.


And this "it's a chicken" versus "it's not a chicken" distinction is ours, Mother Nature doesn't care whether these are chickens or not, the chickens do not make such a distinction. Same with particle/ wave duality, Mother Nature doesn't care whether light is a particle or not, that's our model and if it doesn't work too good it's our fault.


the chicken is just an example of an egg-laying and -borne animal. substitute it with the first


I think that changes the answer by GP's logic though, since then the first egg-layer obviously came before its egg.


Or to take it another direction - how do they gestate? At what point can we call it a chicken and when does the shell (assuming that's what would make us call it an egg) develop?


So that is how it crossed the road


No. It was running on a mainframe. It was JCL that let it cross the road.


It has to be the code, since those are the information/ideas that you've written on any kind of medium such as on a whiteboard or on a paper, or better known as "algorithms".

Also keep in mind with the use of "computer" -- in the past real humans, and in paricular a huge batch, are hired to compute log and sine lookup tables on hand. Earliest case of human SIMD by the way, and some would even take to break encryption by breaking and reversing code boxes, hence they are called "computers", and I reckon many of them being females.


Code, unless you count the abacus etc


i heard the first assembler was written in machine code, then that was used to create compiler. machine code u can just chuck into the cpu. its a little less trivial than assembly because its harder to remember but if u know assembly u can learn it easy enough :>. i dont feel this is an unrealistic path sk i chose to beleive it without any evidence :D


Kind of annoying that I had to follow 4 links just to find a high level justification of the benefits of bootstrapping [0]. I was kinda hoping the "Why" part of this title would address that.

[0] https://bootstrappable.org/benefits.html


It can be difficult to explain why bootstrapping is important. I put a "Why?" section in the README of my own bootstrapping compiler [0] for this reason.

Security is a big reason and it's one the bootstrappable team tend to focus on. In order to avoid the trusting trust problem and other attacks (like the recent xz backdoor), we need to be able to bootstrap everything from pure source code. They go as far as deleting all pre-generated files to ensure that they only rely on things that are hand-written and auditable. So bootstrapping Python for example is pretty complicated because the source contains code generated by Python scripts.

I'm much more interested in the cultural preservation aspect of it. We want to preserve contemporary media for future archaeologists, for example in the Arctic World Archive [1]. Unfortunately it's pointless if they have no way to decode it. So what do we do? We can preserve the specs, but we can't really expect them to implement x265 and everything else they would need from scratch. We can preserve binaries, but then they'd need to either get thousand-year-old hardware running or virtualize a thousand-year-old CPU. We can give them, say, a definition of a simple Lisp, and then give them code that runs on that, but then who's going to implement x265 in a basic Lisp? None of this is really practical.

That's why in my project I made a simple virtual machine, then bootstrapped C on top of it. It's trivially portable, not just to present-day architectures but to future and alien architectures as well. Any future archaeologist or alien civilization could implement the VM in a day, then run the C bootstrap on it, then compile ffmpeg or whatever and decode our media. There are no black boxes here: it's all debuggable, auditable, open, handwritten source code.

[0]: https://github.com/ludocode/onramp?tab=readme-ov-file#why-bo...

[1]: https://en.wikipedia.org/wiki/Arctic_World_Archive


Yep, I think this would have been good context in the OP


Say you start with nothing but "pure source code".

With what tool do you process that source code?


The minimum tool that bootstrapping projects tend to start with is a hex monitor. That is, a simple-as-possible tool that converts hexadecimal bytes of input into raw bytes in memory, and then jumps to it.

You need some way of getting this hex tool in memory of course. On traditional computers this could be done on front panel switches, but of course modern computers don't have those anymore. You could also imagine it hand-woven into core rope memory for example, which could then be connected directly to the CPU at its boot address. There are many options here; getting the hex tool running is very platform-specific.

Once you have a hex tool, you can then use that to input the next stage, which is written in commented hexadecimal source code. The next tool then adds a few features, and so does the tool after that, and so on, eventually working your way up to assembly and C.


From the point of view of trust and security, bootstrapping has to be something that's easily repeatable by everyone, in a reasonable amount of time and steps, with the same results.

Not to mention using only the current versions of all the deliverables or at most one version back.


I'm a bit confused.

It's a bit difficult to dissect, but long story short, in the middle of the post the author finally provides the reason for them embarking on the journey mentioned in the title:

> The main issue (...) is that, by the time C++ is introduced into the bootstrap chain, the bootstrap is basically over. So if you wanted to use Rust at any point before C++ is introduced, you’re out of luck. So, for me, it would be really nice if there was a Rust compiler that could be bootstrapped from C. Specifically, a Rust compiler that can be bootstrapped from TinyCC, while assuming that there are no tools on the system yet that could be potentially useful.

However, this contradicts the premise they lay out earlier in the post:

> Every new version of rustc was compiled with the previous version of rustc. So rustc version 1.80.0 was compiled with rustc version 1.79.0. Which was, in turn, compiled with rustc version 1.78.0. And so on and so forth, all the way back to version 0.7 if the compiler. At that point, the compiler was written in OCaml. So all you needed was an OCaml compiler to get a fully functioning rustc program. (...) There is a project that can *successfully* compile the OCaml compiler using Guile, which is one of the many variants of Scheme, which is one of many variants of Lisp. Not to mention, Guile’s interpreter is written in C.

The contradiction of course is that then there is a path that is without C++ like they want it to, it's just not the one that the rustc team uses day-to-day. The author even claims that it actually works (see the emphasis I placed).

So I'm ultimately not entirely sure about the motivation here. Is the goal to create a nicer C based bootstrapping process? Is the goal to do that and have that eventually become the day-to-day way rustc is bootstrapped? Why does the author want to get rid of the C++ stage? Why does the author prefer to have a C stage?

The only thing that's clear then is that the author just wants to do this period, and that's fine. But otherwise, even after reading through their fairly lengthy post, I'm none the wiser.


While it is technically possible to bootstrap Rust from Guile and the 0.7 Rust compiler, you would need to recompile the Rust compiler about a hundred times. Each step takes hours, and you can't skip any steps because, like he said, 1.80 requires 1.79, 1.79 requires 1.78, and so on all the way back to 0.7. Even if fully automated, this bootstrap would take months.

Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway. If you have a C++ compiler, you might as well compile mrustc. Currently, mrustc only supports rustc 1.54, so you'd still have to compile through some 35 versions of it.

None of this is practical. The goal of Dozer (this project) is to be able to bootstrap a small C compiler, compile Dozer, and use it to directly compile the latest rustc. This gives you Rust right away without having to bootstrap C++ or anything else in between.


This is accurate. I'm an OS/kernel developer and a colleague was given the task of porting rust to our OS. If I remember correctly, it did indeed take months. I don't think mrustc was an option at the time for reasons I don't recall, so he did indeed have to go all the way back to the very early versions and work his way through nearly all the intermediate versions. I had to do a similar thing porting java, although that wasn't quite as annoying as porting rust. I really do wish more language developers would provide a more practical way of bootstrapping their compilers like the article is describing/attempting. I've seen some that do a really good job. Others seem to assume only *nix and Windows exist, which has been pretty frustrating.


I'm curious as to why you need to bootstrap at all? Why not start with adding the OS/kernel as a target for cross-compilation and then cross-compile the compiler?


Nim uses a smaller bootstrap compiler that uses pre-generated C code to then build the compiler proper. It's pretty nifty for porting.


The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.


that's interesting! what kind of os did you write? it sounds like you didn't think supporting the linux system call interface was a good idea, or perhaps even feasible?


It's got a fairly linux like ABI, though we don't aim or care to be 1-1 compatible, and it has/requires our own custom interfaces. Porting most software that was written for linux is usually pretty easy. But we can't just run binaries compiled for linux on our stuff. So for languages that require a compiler written in its own language where they don't supply cross compilers or boot strapping compilers built with the lowest common denominator (usually c or c++), things can get a little trickier.


interesting! what applications are you writing it for?


Building rustc doesn't take hours on a modern machine. Building it 100 times would take on the order of a day, not months.

> Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway.

This is a more legit point.


The current version of rustc may compile itself quickly, but remember, this is after nearly ten years of compiler optimizations. Older versions were much slower.

I seem to recall complaints that old rustc would take many hours to compile itself. Even if it takes on average, say, two hours to compile itself, that's well over a week to bootstrap all the way from 0.7 to present. You're right that months is probably an exaggeration, but I suspect it might take a fair bit longer than a week. The truth is probably somewhere in the middle, though I suppose there's no way to know without trying it.


> Moreover, I believe the earlier versions of rustc only output LLVM, so you need to bootstrap a C++ compiler to compile LLVM anyway. If you have a C++ compiler, you might as well compile mrustc. Currently, mrustc only supports rustc 1.54, so you'd still have to compile through some 35 versions of it.

Not sure I follow - isn't rustc still only a compiler frontend to LLVM, like clang is for C/C++? So if you have any version of rustc, haven't you at that point kind of "arrived" and started bootstrapping it on itself, meaning mission complete?

Ultimately from what I glean the answer really is just that this would be made nicer with Dozer, but I still wish this was explicitly stated by the author in the post. It's not like the drudgery of the ocaml route escapes me.


> Not sure I follow - isn't rustc still only a compiler frontend to LLVM, like clang is for C/C++?

The rustc source tree currently includes LLVM, GCC, and Cranelift backends: https://github.com/rust-lang/rust/blob/c6db1ca3c93ad69692a4c...

(Cranelift itself is written in Rust.)


>From there they can bootstrap yacc, basic coreutils, Bash, autotools, and eventually GCC ... it’s a fascinating process.

I would say about half of the list can be trimmed off if you managed to separate GCC 4 and binutils from their original build scripts, notice the sheer amounts of items there are just repeatedly rebuilding auto-stuff and their dependencies[1].

[1] https://github.com/fosslinux/live-bootstrap/blob/master/part...


I'm not sure I see the point. To generate functional new binaries on the target machine, rustc will need to support the target. If you add that support to rustc, you can just have it build itself.


It's about having a shorter auditable bootstrap process much more than it is about supporting new architectures.


Not dismissing the usefulness of the project at all, but curious what the concrete benefits of that are -- is it mainly to have a smaller, more auditable bootstrap process to make it easier to avoid "Reflections on Trusting Trust"-type attacks?

It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?


Let me make a small example that may illustrate the issue.

You can download the NetBSD source tree and compile it with any reasonable c compiler, whether you're running some sort of BSD, macOS or Linux. Some OSes have much older gcc (Red Hat, for instance), some have modern gcc, some have llvm. The source tree first compiles a compiler, which then compiles NetBSD. It's an automatic, easy to understand, easy to audit, two step process that's really nice and clean.

With rust, if you want to compile current rust, you need a pretty modern, up to date rust. You can usually use the last few versions, but you certainly can't use a version of rust that's even a year old. This, to some of us, is ridiculous - the language shouldn't change so much so quickly that something that was brand new a year ago literally can't be used today to compile something current.

If you really want to bootstrap rust from c, you'd have to start with rust from many years ago, compile it, then use it to compile newer rust, then use that to compile even newer rust, perhaps a half a dozen times until you get to today's rust. Again, this is really silly.

There are many of us who'd like to see rust be more directly usable and less dependent on a chain of compilers six levels deep.


> perhaps a half a dozen times until you get to today's rust.

Perhaps? It was already more than that in 2018: https://guix.gnu.org/blog/2018/bootstrapping-rust/

That was back in 2018. Today mrustc can bootstrap rustc 1.54.0, but current rustc version is 1.80.1. So if the amount of steps still scales similarly, then today we're probably looking at ~26 rustc compilations to get to current version.

And please read that while keeping in mind how Rust compilation times are.


> It seems like you'd need to trust a C compiler anyway, but I guess the idea is that there are a lot of small C compiler designs that are fairly easy to port?

Sorry but TFA explains it very well how to go from nothing to TinyCC. The author's effort now is to go from TinyCC to Rust.


Right, but I was trying to understand the author's motivation, and this was me handwaving about if it could be about compiler trust. The article discusses bootstrapping but not explicitly why the author cares—is it just a fun exercise (they do mention fascination)? Are they using an obscure architecture where there is no OCaml compiler and so they need the short bootstrap chain? _Is_ it about compiler trust?

(Again since it can come off wrong in text, this was just pure curiosity about the project, not dismissiveness.)


Regardless, the process is so long that it seems inauditable in practise.

Like i guess i can see the appeal of starting from nothing as a kind of cool achievement, but i dont think it helps with auditing code.


But with the Rust compiler in C the audit path would be much shorter it sounds like, and therefore be more auditable.

Plus OP also wrote in the post that a goal was to be able to bootstrap to Rust without first having to bootstrap to C++, so that other things can be written in Rust earlier on in the process. That could mean more of the foundation of everything being bootstrapped being written in Rust, instead of in C or C++.


What good is being slightly shorter if it us still nowhere remotely close to practical?

Its kind of like saying 100 years is a lot shorter than 200 years. It might be true, but if all the time you have to dedicate is a few hours it really doesnt matter.


It will be hex editor -> assembler -> tinycc -> dozer -> latest rust so should absolutely be doable or am I missing something?


It doesn't need to be _perfectly_ auditable to be worthwhile — it just needs to be more auditable than the alternatives available today.


I dunno about that; suppose dozer completes its goal, and 1 year later you want to audit the bootstrap chain. Latest Rust probably won't be able to be compiled by it, so you now need to audit what, 6 months of changes to the Rust language? How many months is short enough to handle?

If dozer _does_ keep getting maintained, the situation isn't exactly better either: you instead have to audit the work dozer did to support those 6 months of Rust changes.


> It's about having a shorter auditable bootstrap process

Yeah, in 2018 the chain looked like this[1]:

    g++ -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected] -> [email protected]
Though for me it's less the auditable part, and more that I would be able to build the compiler myself if I wanted, without jumping through so many unnecessary hoops. For the same reason I like having the source code of programs I use, even if most of the time I just use my package manager's signed executable.

And if someone open sources their program, but then the build process is a deliberately convoluted process, then to me that starts to smell like malicious compliance ("it's technically open source"). It's still a gift since I'd get the code either way, so I appreciate that, but my opinion would obviously be different between someone who gives freedoms to users in a seemingly-reluctant way vs someone who gives freedoms to users in an encouraging way.

[1]: https://guix.gnu.org/blog/2018/bootstrapping-rust/


Sometimes I fantasize about writing a C++ interpreter or compiler in scheme: going directly from scheme to current gcc would be a huge shortcut. But common wisdom is that writing a C++ compiler is basically impossible. Still, it’d be instructive!


Looking at the whole stack (starting from a sub-assembler), could this be a way to bypass the issues around "trusting trust"?

https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref...


Only if you audit everything and run the whole process. Even then there is https://en.m.wikipedia.org/wiki/Underhanded_C_Contest which some enteries would have got past me in an audit


I thought that was the whole point?


When I learned C a bit, I was looking up how people did C++-like stuff in C. I found objects, exceptions, concurrency, etc.

If mrustc is written in C++, could it be easier to use such C++-like primitives to port its working C++ to C? And do it a bit at a time using strong interoperability between C++ and C?

Before anyone says it, I know that would be a hard, porting effort with many pitfalls. Remember we’re comparing that to writing a Rust compiler in C, though. It might be easier to port the C++ to C.

This also reminds me of the C++ to C compilers that used to exist. I don’t know if they’re still around. I think both Rust to C/C++ and C++ to human-readable C compilers would be useful today. Especially to combine the safety benefits of one with the tooling of the other.


It’s a huge project, I wonder if it wouldn’t be simpler to try to compile cranelift or mrustc to wasm (that’s still quite difficult) then use wasm2c to get a bootstrap compiler.


That's the approach Zig is taking: https://ziglang.org/news/goodbye-cpp/


The resulting C would not be “source code”.

Edit to explain further: the point is for the code to be written (or at least auditable) by humans.


As long both rust-to-wasm (or zig-to-wasm) and wasm2c are auditable, and every step reproducible, why do you need the generated C to be auditable?


The point is to shorten the minimal bootstrap path to Rust.

With your suggestion you can't use rust until after you already have rust-to-wasm transpiler available (which almost certainly itself already requires rust, so you are back where you started).


The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.


The generated C code could contain a backdoor. Generated C is not really auditable so there would be no way to tell that the code is compromised.


Why isn't anyone referring to bootstrapping a rust compiler as "rusting rust"? :D



Do we have a better method of verifying compilation output than just re-executing the compiler with same source, than comparing the output? TEE attestation could be a thing(albeit it could be a "trusted" third party which occasionally be broken).


Diverse double-compiling (DDC) can help.


Dude, you are amazing. If the rust people are serious about anything, they should support you as much as they can.

You got it all right. Really all. QBE, C without extensions (you should lock the code to C99-ish though, or you will have kiddies introducing ISO C feature creeps then planned obsolescence into your C code on the long run).

C without extensions... where the linux kernel failed hard (and tons of GNU projects... like the glibc): it should have plain assembly replacement (and I mean not using any C compiler inline assembler) and that should compile out-of-the-box with a mere SDK configuration option.

This will be a nearly perfect binary seed for the rust programing language, but you are using QBE, then you get some optimizations... guess what... I did my benchmarks (very basic) with CPROC/QBE and I get ~70% of the speed of latest gcc (tinyCC is 2 times slower than gcc, but its generated assembly code is "neat"/"clean").

All that to say, maybe this project will much more than a binary seed if it becomes a "real life" rust compiler.

The main issue though is the rust syntax itself: heard it is not that stable/finished, and it is on the way to that abomination of c++ syntax complexity. When I tried to read some of latest real life rust syntax, I did not understand anything, and I code mainly C (c++ when I was young brain-washed fool), assembly (rv64/x86_64), this is bad omens.

Oh, and don't forget to provide statically linked binaries for various platforms, you know: the binary seed.


This is why the aforementioned ABI (of the latter language in the title of this post) won't die for a long time. The name of the game is compatibility, not performance/security. Bell Labs was first.


The C ABI won't die because it has a stranglehold on *NIX. Every new language you make has to conform in some way to C in order to use syscalls.


That is not true on Linux, where you can just make syscalls yourself. They don't even use the C ABI, even if it's pretty similar (For syscalls, the fourth argument is passed in R10 instead of RCX, since that holds the return address for sysret).


You can call a syscall with assembly, but the result it gives you follows C's formats. Maybe your language does integers in a different way so you still have to abide to it's standard to adapt what the OS gives you


I'd argue that it follows the CPU's native integer representation, which C also does. Yes, if your language uses tagged integers or something, you'll have to marshal the syscall arguments/results from/to your own representation, but the same is true if you want to use those integers for arithmetic (beyond trivial additions or multiplying/dividing by a power of two, for which you can use lea).


Mine was not a critique. Of course every OS needs to be programmed with a language and its syscalls will be formatted accordingly. And if you want to program using an OS's features, other than the compilation to assembly, you also have to worry about inter-operating with what the OS provides. I'm simply noting that for the foreseeable future, C's way of doing things will always have to be kept in mind when writing dev tools


Sure, that makes sense. Out of curiosity, do you know of any way to design a syscall ABI that's not C-like that was either used in the past or would have some advantages if a new OS adopted it? I imagine that lisp machines did things differently, but a) I don't know whether they had syscalls as such or simply offered kernel services as regular functions and b) they did have microcode support for tagged integers and objects.

I'm asking since I want to get into (hobbyist) OS development at some point and would love to know if there's a better way to do syscalls.


Yes, C and Unix are closely related.


Yes, I think rust made a big mistake for not going for a stable (or at least mostly stable like C++) ABI (other than the C one). The "staticly link everything" is fine for desktops and servers, but not for e.g. embedded Linux applications with limited storage. It's too bad because things like routers are some of the most security sensitive devices.


> or at least mostly stable like C++

The C++ ABI doesn't solve generics (templates); C++ templates live in header files, as source code, and get monomorphized into code that's embedded in the binary that includes that header. The resulting monomorphized code is effectively statically linked, and any upgraded version of a shared library has to deal with old versions of template code from header files, or risk weird breakage.

Swift has a more realistic solution to this problem: polymorphic interfaces (the equivalent of Rust "dyn"). That's something we're taking inspiration from in the design of future Rust stable ABIs.

> but not for e.g. embedded Linux applications with limited storage

Storage is often not the limiting factor for anything running embedded Linux (as opposed to something much smaller).

The primary point in favor of shared libraries is to aid in the logistics of upgrading dependencies for security. It's possible to solve that in other ways, though.


Very cool project.

I'm not totally sold on the practical justification (though I appreciate that might not be the real driving motive here). This targets Cranelift, so it gives you a Rust compiler targeting the platforms that Rust already supports. You could use it to cross compile Rust code from a non-supported platform to a supported one, but then you'd be using a 'toy' implementation for generating your production builds (rather than just to bootstrap a compiler).


<mischief> Maybe the bootstrap process should use FORTH as part of the toolchain? </mischief>

Not mischief: I'd probably look at that option if I was taking this on.


From one of the guys heavily involved in all this bootstrapping stuff:

https://lobste.rs/s/fybdug/pulling_linux_up_by_its_bootstrap...

> The answer to the question about FORTH is:

> well we bootstrapped multiple FORTHs; no one actually was willing to actually do the bootstrapping steps in FORTH besides Virgil Dupras who did collapseOS and duskOS. (Which unfortunately neither currently have a path to GCC or Linux)


The ultimate answer given later in the above-linked comment is that bootstrapping with FORTH is a great idea but programming in FORTH isn't fun enough to follow up on the notion.


Bootstrapping with forth is a GREAT idea. I think it's one of the best languages to use for bootstrapping.

The reason is simple: forth can be almost the first thing in the chain, and it's so flexible that most of the rest of the bootstrap can be done by simply building up forth definitions.

The way the bootstrap chain generally builds up the level of abstraction is by compiling a somewhat more general language, multiple times, until you reach something usable. If you bootstrap forth you're basically there. You have clean, readable source code that can be ran with a ridiculously simple interpreter/compiler. It's a very natural choice.

But of course forth is such a different paradigm that most people just don't want to learn how to write in it properly (in such a way that you end up with actually readable code). Which is fine. I guess it really isn't fun enough for most. But it's difficult to ignore just how great of a fit it is.


IIUC, this is frequently suggested but never followed through on by someone who knows enough Forth to do it.


Since the requirements have made a C dependency acceptable, then why not make the job easier by just writing a Rust->C source translator ?


love the use of QBE for backend here. will be interesting to follow and see any comparisons against rust with llvm! good luck!


It always comes back to C.


> It’s basically code alchemy.

More like archaeology. Alchemy was essentially magic, but there's nothing magic about bootstrapping from hex-punched assembly.


if this works would this make the rust compiler considerably smaller / faster?


Smaller? Yes. Faster? Almost certainly not.

It really doesn't make sense to optimize anything in a bootstrapping compiler. Usually the only code that will ever be compiled by this compiler will be rustc itself. And rustc doesn't need to run fast - just fast enough to recompile itself. So, the output also probably won't have any optimisations applied either.


if it is smaller, doesn't it mean that it has less code to execute hence should it be faster? Trying to understand better -- this is something completely new for me


Not necessarily, in fact one of the most important optimizations for compilers is inlining code (copy-pasting function bodies into call sites) which results in more code being generated (more space) but faster wallclock times (more speed). Most optimizations tradeoff size for speed in some way, and compilers have flags to control it (eg -Os vs -O3 tells most C compilers to optimize for size instead of speed).

Where optimizing for size is optimizing for speed is when it's faster (in terms of wall clock time) for a program to compute data than to read it from memory, disk, i/o etc, because i/o bandwidth is generally much slower than execution bandwidth. That means the processor does more work, but it takes less time because it's not waiting for data to load through the cache or memory.


great explanation. thank you!


Why would a program run faster just because it’s smaller?


Example: this is a small program

int main() { for(;;); }


Oh, I suppose I’m imagining two implementations that both do the same work. (Like two rust compilers).

Eg, quicksort vs bubble sort. Quicksort is usually more code but faster.

Or a linked list vs a btree. The linked list is less code, but the btree will be faster.

Or substring search, with or without simd. The simd version will be longer and more complex, but run faster.

Even in a for loop - if the compiler unrolls the loop, it takes up more code but often runs faster.

If you have two programs that both do the same work, and one is short and simple, I don’t think that tells you much about which will run faster.


No, this won't change rustc at all. The purpose of this project is to be able to bootstrap a current version of rustc without having to do hundreds of intermediate compilations to go from TinyCC -> Guile -> OCaml -> Rust 0.7 -> ...... Rust current. (Or bootstrap a C++ compiler to be able to build mrustc, which can compile Rust 1.56, which will give you Rust current after another 25 or so compilations.)

Ultimately the final rustc you get will be more or less identical to the one built and distributed through rustup.


> will be more or less identical

What could cause differences between the bootstrapped rustc and rustup’s?


In theory there shouldn’t be any. The official Rust builds, I believe, have one level of bootstrapping: the previous official build is used to build an intermediate build of the current version, which is then used to build itself. So the distributed binaries are the current version built with the current version. A longer from-source bootstrapping process should also end by building the current version with the current version, and that should lead to a bit-for-bit identical result.

In practice, you’ll have to make sure the build configuration, the toolchain components not part of rustc itself (e.g. linker), and the target sysroot (e.g. glibc .so file) are close or identical to what the official builds are using. Also, while rustc is supposed to be reproducible, and thus free of the other usual issues with reproducible builds (like paths or timestamps being embedded into the build), there might be bugs. And I’m not sure if reproducibility requires any special options which the official builders might not be passing. Hopefully not.

See also: https://github.com/rust-lang/rust/issues/75362


Why not write the compiler in Rust, then compile it to assembly, and then use some disassembler/decompiler to compile that back to portable C?


The article mentions that the Bootstrappable Builds folks don't allow pre-generated code in their processes, they always have to build or bootstrap it from the real source.


Because that wouldn't be reasonable to audit. The program that compiles the new Rust compiler, as well as the programs that disassemble and decompile, could insert backdoors or other nefarious behavior into the generated C, in a way that could be difficult to detect.

The "ethos" (for lack of a better word) of these bootstrapping projects requires that everything be written by hand, and in a way that can be auditable.


Wait, dissemblers will turn assembly into any language you want?



Well they try. They tend to get lost on x86 where instructions are not fixed length.


TL;DR his goal is rust, but for bootstrapping a first rust compiler for a new environment, the work is already done for C

the article is interesting, and links to some interesting things, but that's what the article is about

his project is https://codeberg.org/notgull/dozer

he references bootstrappable builds https://bootstrappable.org/ a systematized approach to start from the ground up with very simple with a 512 byte "machine coder" (more basic than an assembler) and build up from there rudimentary tools, a "small C subset compiler" which compiles a better C compiler, etc, turtles all the way up.


Nice hobby project but in my opinion it's futile since rustc moves quite fast and is using the newest rust features

Having a rust backend emitting C would be an easier way but at that point, just cross compile


  metadat@zukrfukr:/src/dozer$ \
  wc -l 
  $(find . -name '*.c')
     280 ./src/item.c
     851 ./src/lex.c
     166 ./src/parser.c
     107 ./src/libdozer.c
     103 ./src/resolve.c
     167 ./src/path.c
     219 ./src/traverse.c
      91 ./src/scope.c
     144 ./src/qbe.c
     134 ./src/map.c
    1045 ./src/expr.c
     266 ./src/nhad.c
     349 ./src/emit.c
      92 ./src/main.c
     231 ./src/type.c
     223 ./src/pattern.c
      97 ./src/typemap.c
     224 ./src/token.c
     148 ./src/util.c
     141 ./src/stmt.c
    5078 total
5kloc is pretty light for a `rustc', where are the tests showing what aspects of the grammar ar supported so far in @notgull's crowning achievement ? The article might be longer than the source code, which would be extremely impressive if the thing actually worked :)

I was not able to compile tokio with dozer.

For comparison, turn towards the other major lang HN submission today: a Golang compiler written in PHP; It comes with extensive tests showing what works and what does not. Somehow even the goroutines are working.. in PHP.

Golang interpreter written in PHP - https://github.com/tuqqu/go-php - https://news.ycombinator.com/item?id=41339818

Godspeed.


From the article:

> But so far, I have the lexer done, as well as a sizable part of the parser. Macro/module expansion is something I’m putting off as long as possible, typechecking only supports i32, and codegen is a little bit rough. But it’s a start.

So it is currently nowhere near complete (and the author never claims otherwise).


The project isn’t complete. It can only build trivial examples, definitely not something like tokio.


The goroutines are working in that they execute, they are not concurrent.


It's only a `#include <pthread.h>' away. *grin*


For bootstrapping it still feels weird to target C. You could easily target a higher level language or just invent a better language. You don't care about runtime performance. Feels like you don't really gain that much by forcing yourself to jump through the C hoop, and the cost of having to write an entire compiler in C is huge.

Like, how hard would it be to go via Java instead? I bet you can bootstrap to Java very easily.


I'd expect it to be harder. I used to work on a large embedded device that ran some Java code, and there was a specialist vendor providing Java for the offbeat processor platform.

After a little digging, I found a blog post about it, and it does sound denser than the poster's plans to bootstrap Rust: https://www.chainguard.dev/unchained/fully-bootstrapping-jav...


>feels weird to target C

he's not targeting C, he's targeting rust; he's using C

it's an important distinction, because he's not writing the C compilers involved, he's leveraging them to compile his target rust compiler which will be used to compile a rust-on-rust compiler. The C compiler is the compiler he has available, any other solution he would have to write that compiler, but his target is rust.


Targeting C as the language to write his Rust compiler in. You knew that.


i explained the distinction, i explained why it matters, in great detail, and OP explained it also, sorry you still don't know that.

if he "targetted rust" the way you and GP are using it, he would have to duplicate the entire bootstrappable project as his target (my usage)


Every platform, for better or worse, gets a C compiler first. Targeting C is the most practical option.


Right, but once you have C it's fairly straightforward to use an interpreted language implemented in C (python, perl, guile, lua, whatever).

Obviously such a compiler would likely be unusably slow, but that's not important here.


You overestimate the comprehensiveness of C standard with half the things being optional. It's not given that python will compile on a minimal comforming C compiler.


True, but Lua probably will :)


Lua definitely will compile on an ANSI C compiler, without POSIX or Win32 extensions.


Rewrite it in C. Great idea. Just do not tell it to rust community...

I like to see that programmers like you still exist and belive in what they do.

Remembered this article... https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-repl...


> Remembered this article... https://drewdevault.com/2019/03/25/Rust-is-not-a-good-C-repl...

Remembering Drew Devault is the Fox News of programming bloggers. He exhibits the same sort of bad faith obtuseness, and knee-jerk neck beard tech conservatism, that makes me/many want to scream.

First, his thesis is risible. "Rust is not a good C replacement". Note, Drew does not mean replace C code with Rust code, but Rust, the language, literally replacing C, the language. Ignoring, perhaps, Rust doesn't want to "replace" C, because we have C!

Next, see the bulleted text. Upon each topic something interesting might be said re: Rust, but instead they all serve a garbage thesis that Rust can never be the 50 year old language that the tech world is currently built upon. Well, duh.

My least favorite, though, is the final bullet:

> Safety. Yes, Rust is more safe. I don’t really care. In light of all of these problems, I’ll take my segfaults and buffer overflows.

And everyone wants to be a cowboy and watch things blow up when they are 8 years old.


What does that article have to do with this article? The author of the latter article even says that they don’t enjoy writing C, which is kind of the opposite of what your article says


The community has been very supportive of the gccrs (https://github.com/Rust-GCC/gccrs) project, which is the main project to write a Rust compiler written in C.


It's in C++, not C


I wouldn't say very supportive at all. It often gets bashed whenever some news about it is posted on r/rust for example.


Given that we're this far along, bootstrapping is purely an aesthetic exercise (and a cool one, to be sure -- I love aesthetic exercises). If it were an actual practical concern, presumably it would be much easier to use the current rustc toolchain to compile rustc to RISC-V and write a RISC-V emulator in C suitable for TinyC. Unless it's a trust exercise and you don't trust any rustc version.


The practical concern for my colleagues and me is that we're OS/kernel developers for an operating system that isn't currently supported. I had to fight these kind of problems to get java ported to our OS, and a coworker had to do it for rust, which was much much harder. And he did end up having to start from one of the earliest versions and compile nearly every blasted version between then and now to get the latest. It's a royal pain and a major time sink. If there were a viable rustc that was written in C or even C++ at the time, we could have been done in a few days. Instead it took months.


As in your other comment there seems to be some confusion between bootstrapping and porting here? If you want to port Rust to a new OS then you ‘just’ need to add a new build target to the compiler and then cross-compile the compiler for your OS (from an OS with existing Rust support). That may indeed be a lot of work, but it doesn’t require bootstrapping the compiler, so this project wouldn’t be of any help in that scenario.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: