Bootstrapping Specific Languages

From bootstrapping

go[edit]

The golang compiler is implemented in golang. It was originally in C, but they semi-automatically translated it into go: GopherCon 2014 Go from C to Go by Russ Cox. But gcc has it's own implementation of go too! gcc-go lets you compile the golang implementation of go. So you can bootstrap go from the gcc platform.

This is not done yet in Guix.

According to https://golang.org/doc/install/gccgo, gccgo-4.8.2 includes a
complete go-1.1.2 implementation, gccgo-4.9 includes a complete go-1.2
implementation, and gccgo-5 a complete implementation of go-1.4. Ultimately
we hope to build go-1.5+ with a bootstrap process using gccgo-5. As of
go-1.5, go cannot be bootstrapped without go-1.4, so we need to use go-1.4 or
gccgo-5. Mips is not officially supported, but it should work if it is
bootstrapped.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/golang.scm?id=0f377aad75314df6c6296f4172801519315d81cd#n50

Rust[edit]

The rust compiler is implemented in rust, each version is used to compile the next version **The train model**.

This file describes the stage0 compiler that's used to then bootstrap the Rust compiler itself. For the rustbuild build system, this also describes the relevant Cargo revision that we're using.

Currently Rust always bootstraps from the previous stable release, and in our train model this means that the master branch bootstraps from beta, beta bootstraps from current stable, and stable bootstraps from the previous stable release.

If you're looking at this file on the master branch, you'll likely see that rustc and cargo are configured to `beta`, whereas if you're looking at a source tarball for a stable release you'll likely see `1.x.0` for rustc and `0.x.0` for Cargo where they were released on `date`.

Fortunately there is a rust compiler implemented in C++ that is able to bootstrap rust.

This project is an attempt at creating a simple rust compiler in C++, with the ultimate goal of being a separate re-implementation.

mrustc works by comping assumed-valid rust code (i.e. without borrow checking) into a high-level assembly (currently using C, but LLVM/cretonne or even direct machine code could work) and getting an external code generator to turn that into optimised machine code. This works because the borrow checker doesn't have any impact on the generated code, just in checking that the code would be valid.

mrustc is packaged inside guix but building rustc is still done off an initial binary seed and from then on using the train model.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/rust.scm?id=c1cdadc6bafe6b902ffe580febc3365358b9014b#n72

UPDATE: rustc has been bootstrapped off of mrustc in guix!

https://www.gnu.org/software/guix/blog/2018/bootstrapping-rust/

java[edit]

In guix, Java is bootstrapped using Jikes: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/java.scm#n82

The Java bootstrap begins with Jikes, a Java compiler written in C++. We
use it to build a simple version of GNU Classpath, the Java standard
library. We chose version 0.93 because it is the last version that can be
built with Jikes. With Jikes and this version of GNU Classpath we can
build JamVM, a Java Virtual Machine. We build version 1.5.1 because it is
the last version of JamVM that works with a version of GNU classpath that
does not require ECJ. These three packages make up the bootstrap JDK.
This is sufficient to build an older version of Ant, which is needed to
build an older version of ECJ, an incremental Java compiler, both of which
are written in Java.
ECJ is needed to build the latest release (0.99) and the development
version of GNU Classpath. The development version of GNU Classpath has
much more support for Java 1.6 than the latest release, but we need to
build 0.99 first to get a working version of javah. ECJ, the development
version of GNU Classpath, and the latest version of JamVM make up the
second stage JDK with which we can build the OpenJDK with the Icedtea 1.x
build framework. We then build the more recent JDKs Icedtea 2.x and
Icedtea 3.x.

C#[edit]

While the runtime of Mono is written in C, the C# compiler of Mono is written in C# itself, and needs a recent version of Mono to bootstrap.

Turns out Mono's C# compiler from the very beginning was always written in C# and they used the Microsoft C# compiler to build their release binaries. https://www.mono-project.com/docs/about-mono/history/

Fortunately there is a C# compiler written in C https://www.gnu.org/software/dotgnu/pnet.html that supports the ECMA C# Language Specification ECMA-334 which should be good enough to compile at least one version of Mono's C# compiler and allow a free bootstrap to be created.

We just need someone to do that work

Scala[edit]

Julien started writing a bootstrap compiler for Scala in Java, which is already able to produce an AST for Scala files and produce JVM bytecode

lisp and scheme[edit]

Lisp interpreters are relatively easy to write in assembly or C, the primary issue is macros. To properly implement macros in lisp, you have two options: 1) Implement lazy evaluation on top of eager evaluation (can get very ugly) or 2) Implement a lazy lisp (a boatload of work) and simply use straight lambdas to provide macro functionality

  • bootstrappable lisp/scheme compilers: guile scheme, mes, tarot, tinyscheme, single_cream.
  • non bootstrappable lisp/scheme compilers: chez scheme, racket scheme, MIT/GNU Scheme (almost all).

Note that Guix has MIT/GNU Scheme package but mentions it's not bootstrappable.

https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/scheme.scm

FORTH[edit]

FORTH interpreters are far easier to write in assembly than they are to write in any other (non-functional) language. The biggest issue ends up being what FORTH standard to follow and finding programmers willing to use/improve your FORTH.

  • Jonesforth.
  • Okami
  • Retro

The preForth project by Ulrich Hoffmann aims to synthesize features of various Forths into a minimal kernel, then extend itself into seedForth (see the FOSDEM 2020 talk). Currently preForth must be bootstrapped with gforth or swiftForth.

There is a minimal Forth implemented in 1359 SLOC of assembly in the stage0 repository. Either modifying this Forth to conform to preForth specs or bootstrapping preForth with the stage0 Forth are possible bootstrapping paths.

Assembly[edit]

Contrary to what you would expect, it actually ends up being far easier to implement as Macro-assembler than it is to implement an assembler without Macros. This is especially true when it comes to the easiest form of Macros, line macros.

Bison[edit]

At what point did Bison self host? Looking through source code archives I found this:

$ find bison-1.28 -iname '*.y'

$ find bison-1.29 -iname '*.y' bison-1.29/intl/plural.y

$ find bison-1.50 -iname '*.y' bison-1.50/src/parse-gram.y

and in the changelog:

2002-06-11 Akim Demaille <akim@epita.fr>

Have Bison grammars parsed by a Bison grammar.

In git this happens at revision e9955c83734d0a545d7822a1feb9c4a8038a62cb.

To bootstrap bison it's likely that one can just build an early version of it then the latest version of it, maybe an extra version in the middle if that fails. **Task: Find a working sequence of bison versions so you can build the latest without having bison to start of with..*

Note: Bison ships the generated files parse-gram.{c,h,output} along with the source code. These generated files have lookup table in them which are sort of impenetrable. The rest is very simple generated C code. Overall I think treating these files as source code is OK, it wouldn't be possible to hide a virus in them. On the other hand the lexer has much bigger lookup tables, maybe it could be possible to hide maliciously swapping one symbol for another inside the lexer lookup tables, so for this reason I think bootstrapping the lexer is worth doing.

gio found an interesting commit https://git.savannah.gnu.org/cgit/bison.git/commit/?id=cd3684cfa8e5b6faa2ce00330a6d84bd04d165d4 introduces a feature in bison (%initial-action) and uses it in the very same commit to build bison itself

UPDATE gio managed to work out a bootstrap sequence

Ada and SPARK[edit]

GNAT Ada needs a binary blob from adacore.com to bootstrap the GCC frontend. Guix and Nix do NOT currently have packages for GCC GNAT or GNAT Studio. AdaCore has dropped support for GNAT Community Edition, so there is now only GNAT FSF (GPLv3 with GCC runtime library linking exception).

You can see the Gentoo Portage ebuild for clues: GNAT GPL 2021 and note the bootstrap USE flag. The LinuxFromScratch guide notes the circular dependency.

Many compilers exist for subsets of Ada on GitHub (usually as university course projects): The Ada/Ed translator/interpreter is written in C and licensed GPLv2, available on GitHub, and already packaged in Guix. It implements Ada 83 and was (plausibly) privately extended to support Ada 95 to bootstrap GNAT but is now unmaintained. Therefore, one possible bootstrap path is to extend Ada/Ed to support Ada 95, then bootstrap GNAT, possibly starting with the earliest available sources.

GNAT was initially released separately from the main GCC sources. On October 2, 2001 the GNAT sources were contributed to the GCC CVS repository.[5] The last version to be released separately was GNAT 3.15p, based on GCC 2.8.1, on October 2, 2002. Starting with GCC 3.4, on major platforms the official GCC release is able to pass 100% of the ACATS Ada tests included in the GCC testsuite. By GCC 4.0, more exotic platforms were also able to pass 100% of the ACATS tests. (Wikipedia)

Pascal and Nim[edit]

Nim was once written in Pascal. There are bootstrappable Pascal compilers, but none of them get anywhere near the featureset of the monolithic FPC (and many of those features are used extensively in the FPC source...) See Aesop for the Pascal bootstrapping project's status.