Bootstrapping Specific Languages

go
The golang compiler is implemented in golang. It was originally in C, but they semi-automatically translated it into go: GopherCon 2014 Go from C to Go by Russ Cox. But gcc has it's own implementation of go too! gcc-go lets you compile the golang implementation of go. So you can bootstrap go from the gcc platform.

This is not done yet in Guix.


 * According to https://golang.org/doc/install/gccgo, gccgo-4.8.2 includes a
 * complete go-1.1.2 implementation, gccgo-4.9 includes a complete go-1.2
 * implementation, and gccgo-5 a complete implementation of go-1.4. Ultimately
 * we hope to build go-1.5+ with a bootstrap process using gccgo-5. As of
 * go-1.5, go cannot be bootstrapped without go-1.4, so we need to use go-1.4 or
 * gccgo-5. Mips is not officially supported, but it should work if it is
 * bootstrapped.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/golang.scm?id=0f377aad75314df6c6296f4172801519315d81cd#n50

Rust
The rust compiler is implemented in rust, each version is used to compile the next version **The train model**.

This file describes the stage0 compiler that's used to then bootstrap the Rust compiler itself. For the rustbuild build system, this also describes the relevant Cargo revision that we're using.

Currently Rust always bootstraps from the previous stable release, and in our train model this means that the master branch bootstraps from beta, beta bootstraps from current stable, and stable bootstraps from the previous stable release.

If you're looking at this file on the master branch, you'll likely see that rustc and cargo are configured to `beta`, whereas if you're looking at a source tarball for a stable release you'll likely see `1.x.0` for rustc and `0.x.0` for Cargo where they were released on `date`.

Fortunately there is a rust compiler implemented in C++ that is able to bootstrap rust.

This project is an attempt at creating a simple rust compiler in C++, with the ultimate goal of being a separate re-implementation.

mrustc works by comping assumed-valid rust code (i.e. without borrow checking) into a high-level assembly (currently using C, but LLVM/cretonne or even direct machine code could work) and getting an external code generator to turn that into optimised machine code. This works because the borrow checker doesn't have any impact on the generated code, just in checking that the code would be valid.


 * https://github.com/thepowersgang/mrustc

mrustc is packaged inside guix but building rustc is still done off an initial binary seed and from then on using the train model.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/rust.scm?id=c1cdadc6bafe6b902ffe580febc3365358b9014b#n72

UPDATE: rustc has been bootstrapped off of mrustc in guix!

https://www.gnu.org/software/guix/blog/2018/bootstrapping-rust/

java
In guix, Java is bootstrapped using Jikes: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/java.scm#n82


 * The Java bootstrap begins with Jikes, a Java compiler written in C++. We
 * use it to build a simple version of GNU Classpath, the Java standard
 * library. We chose version 0.93 because it is the last version that can be
 * built with Jikes. With Jikes and this version of GNU Classpath we can
 * build JamVM, a Java Virtual Machine. We build version 1.5.1 because it is
 * the last version of JamVM that works with a version of GNU classpath that
 * does not require ECJ. These three packages make up the bootstrap JDK.


 * This is sufficient to build an older version of Ant, which is needed to
 * build an older version of ECJ, an incremental Java compiler, both of which
 * are written in Java.
 * ECJ is needed to build the latest release (0.99) and the development
 * version of GNU Classpath. The development version of GNU Classpath has
 * much more support for Java 1.6 than the latest release, but we need to
 * build 0.99 first to get a working version of javah. ECJ, the development
 * version of GNU Classpath, and the latest version of JamVM make up the
 * second stage JDK with which we can build the OpenJDK with the Icedtea 1.x
 * build framework. We then build the more recent JDKs Icedtea 2.x and
 * Icedtea 3.x.
 * Icedtea 3.x.

C#
While the runtime of Mono is written in C, the C# compiler of Mono is written in C# itself, and needs a recent version of Mono to bootstrap.

Turns out Mono's C# compiler from the very beginning was always written in C# and they used the Microsoft C# compiler to build their release binaries. https://www.mono-project.com/docs/about-mono/history/

Fortunately there is a C# compiler written in C https://www.gnu.org/software/dotgnu/pnet.html that supports the ECMA C# Language Specification ECMA-334 which should be good enough to compile at least one version of Mono's C# compiler and allow a free bootstrap to be created.

We just need someone to do that work

Scala
"Julien started writing a bootstrap compiler for Scala in Java, which is already able to produce an AST for Scala files and produce JVM bytecode"


 * https://www.gnu.org/software/guix/blog/2018/reproducible-builds-summit-4th-edition/
 * https://framagit.org/tyreunom/scabo

lisp and scheme
Lisp interpreters are relatively easy to write in assembly or C, the primary issue is macros. To properly implement macros in lisp, you have two options: 1) Implement lazy evaluation on top of eager evaluation (can get very ugly) or 2) Implement a lazy lisp (a boatload of work) and simply use straight lambdas to provide macro functionality


 * bootstrappable lisp/scheme compilers: guile scheme, mes, tarot, tinyscheme, single_cream.
 * non bootstrappable lisp/scheme compilers: chez scheme, racket scheme, MIT/GNU Scheme (almost all).

Note that Guix has MIT/GNU Scheme package but mentions it's not bootstrappable.

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/scheme.scm

FORTH
FORTH interpreters are far easier to write in assembly than they are to write in any other (non-functional) language. The biggest issue ends up being what FORTH standard to follow and finding programmers willing to use/improve your FORTH.


 * Jonesforth.
 * Okami
 * Retro

Assembly
Contrary to what you would expect, it actually ends up being far easier to implement as Macro-assembler than it is to implement an assembler without Macros. This is especially true when it comes to the easiest form of Macros, line macros.

Bison
At what point did Bison self host? Looking through source code archives I found this:

and in the changelog:

To bootstrap bison it's likely that one can just build an early version of it then the latest version of it, maybe an extra version in the middle if that fails. **Task: Find a working sequence of bison versions so you can build the latest without having bison to start of with..*

Note: Bison ships the generated files parse-gram.{c,h,output} along with the source code. These generated files have lookup table in them which are sort of impenetrable. The rest is very simple generated C code. Overall I think treating these files as source code is OK, it wouldn't be possible to hide a virus in them. On the other hand the lexer has much bigger lookup tables, maybe it could be possible to hide maliciously swapping one symbol for another inside the lexer lookup tables, so for this reason I think bootstrapping the lexer is worth doing.

gio found an interesting commit https://git.savannah.gnu.org/cgit/bison.git/commit/?id=cd3684cfa8e5b6faa2ce00330a6d84bd04d165d4 introduces a feature in bison (%initial-action) and uses it in the very same commit to build bison itself

UPDATE gio managed to work out a bootstrap sequence

Ada and SPARK
GNAT Ada needs a binary blob from adacore.com to bootstrap the GCC frontend. Guix and Nix do NOT currently have packages for GCC GNAT or GNAT Studio. There are two free versions of GNAT: GNAT CE (GPLv3) and FSF GNAT (GPLv3 with GCC runtime library linking exception), and the FSF version is generally lagging behind (StackOverflow answer).

You can see the Gentoo Portage ebuild for clues: GNAT GPL 2019 and note the bootstrap USE flag. The LinuxFromScratch guide notes the circular dependency.

Many compilers exist for subsets of Ada on GitHub (usually as university course projects): The Ada/Ed translator/interpreter is written in C and licensed GPLv2, available on GitHub, and already packaged in Guix. It was originally used to bootstrap GNAT but is now unmaintained and does not pass all the recent Ada tests.

GNAT was initially released separately from the main GCC sources. On October 2, 2001 the GNAT sources were contributed to the GCC CVS repository.[5] The last version to be released separately was GNAT 3.15p, based on GCC 2.8.1, on October 2, 2002. Starting with GCC 3.4, on major platforms the official GCC release is able to pass 100% of the ACATS Ada tests included in the GCC testsuite. By GCC 4.0, more exotic platforms were also able to pass 100% of the ACATS tests. (Wikipedia)

Pascal and Nim
Nim was once written in Pascal. There are bootstrappable Pascal compilers, but none of them get anywhere near the featureset of the monolithic FPC (and many of those features are used extensively in the FPC source...) See Aesop for the Pascal bootstrapping project's status.