Boostrapping Specific Languages

From bootstrapping
Jump to navigation Jump to search

go[edit]

The golang compiler is implemented in golang. It was originally in C, but they semi-automatically translated it into go: GopherCon 2014 Go from C to Go by Russ Cox. But gcc has it's own implementation of go too! gcc-go lets you compile the golang implementation of go. So you can bootstrap go from the gcc platform.

This is not done yet in Guix.

According to https://golang.org/doc/install/gccgo, gccgo-4.8.2 includes a
complete go-1.1.2 implementation, gccgo-4.9 includes a complete go-1.2
implementation, and gccgo-5 a complete implementation of go-1.4. Ultimately
we hope to build go-1.5+ with a bootstrap process using gccgo-5. As of
go-1.5, go cannot be bootstrapped without go-1.4, so we need to use go-1.4 or
gccgo-5. Mips is not officially supported, but it should work if it is
bootstrapped.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/golang.scm?id=0f377aad75314df6c6296f4172801519315d81cd#n50

Rust[edit]

The rust compiler is implemented in rust, each version is used to compile the next version **The train model**.

  1. This file describes the stage0 compiler that's used to then bootstrap the Rust
  2. compiler itself. For the rustbuild build system, this also describes the
  3. relevant Cargo revision that we're using.
  4. Currently Rust always bootstraps from the previous stable release, and in our
  5. train model this means that the master branch bootstraps from beta, beta
  6. bootstraps from current stable, and stable bootstraps from the previous stable
  7. release.
  8. If you're looking at this file on the master branch, you'll likely see that
  9. rustc and cargo are configured to `beta`, whereas if you're looking at a
  10. source tarball for a stable release you'll likely see `1.x.0` for rustc and
  11. `0.x.0` for Cargo where they were released on `date`.

Fortunately there is a rust compiler implemented in C++ that is able to bootstrap rust.

This project is an attempt at creating a simple rust compiler in C++, with the ultimate goal of being a separate re-implementation.

mrustc works by comping assumed-valid rust code (i.e. without borrow checking) into a high-level assembly (currently using C, but LLVM/cretonne or even direct machine code could work) and getting an external code generator to turn that into optimised machine code. This works because the borrow checker doesn't have any impact on the generated code, just in checking that the code would be valid.

mrustc is packaged inside guix but building rustc is still done off an initial binary seed and from then on using the train model.

http://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/rust.scm?id=c1cdadc6bafe6b902ffe580febc3365358b9014b#n72

UPDATE: rustc has been bootstrapped off of mrustc in guix!

https://www.gnu.org/software/guix/blog/2018/bootstrapping-rust/

java[edit]

In guix, Java is bootstrapped using Jikes: https://git.savannah.gnu.org/cgit/guix.git/tree/gnu/packages/java.scm#n82

The Java bootstrap begins with Jikes, a Java compiler written in C++. We
use it to build a simple version of GNU Classpath, the Java standard
library. We chose version 0.93 because it is the last version that can be
built with Jikes. With Jikes and this version of GNU Classpath we can
build JamVM, a Java Virtual Machine. We build version 1.5.1 because it is
the last version of JamVM that works with a version of GNU classpath that
does not require ECJ. These three packages make up the bootstrap JDK.
This is sufficient to build an older version of Ant, which is needed to
build an older version of ECJ, an incremental Java compiler, both of which
are written in Java.
ECJ is needed to build the latest release (0.99) and the development
version of GNU Classpath. The development version of GNU Classpath has
much more support for Java 1.6 than the latest release, but we need to
build 0.99 first to get a working version of javah. ECJ, the development
version of GNU Classpath, and the latest version of JamVM make up the
second stage JDK with which we can build the OpenJDK with the Icedtea 1.x
build framework. We then build the more recent JDKs Icedtea 2.x and
Icedtea 3.x.

Scala[edit]

Julien started writing a bootstrap compiler for Scala in Java, which is already able to produce an AST for Scala files and produce JVM bytecode

lisp and scheme[edit]

Lisp interpreters are relatively easy to write in assembly or C, the primary issue is macros. To properly implement macros in lisp, you have two options: 1) Implement lazy evaluation on top of eager evaluation (can get very ugly) or 2) Implement a lazy lisp (a boatload of work) and simply use straight lambdas to provide macro functionality

  • bootstrappable lisp/scheme compilers: guile scheme, mes, tarot, tinyscheme, single_cream.
  • non bootstrappable lisp/scheme compilers: chez scheme, racket scheme, (almost all).

FORTH[edit]

FORTH interpreters are far easier to write in assembly than they are to write in any other (non-functional) language. The biggest issue ends up being what FORTH standard to follow and finding programmers willing to use/improve your FORTH.

  • Jonesforth.
  • Okami
  • Retro

Assembly[edit]

Contrary to what you would expect, it actually ends up being far easier to implement as Macro-assembler than it is to implement an assembler without Macros. This is especially true when it comes to the easiest form of Macros, line macros.

Bison[edit]

At what point did Bison self host? Looking through source code archives I found this:

$ find bison-1.28 -iname '*.y'

$ find bison-1.29 -iname '*.y' bison-1.29/intl/plural.y

$ find bison-1.50 -iname '*.y' bison-1.50/src/parse-gram.y

and in the changelog:

2002-06-11 Akim Demaille <akim@epita.fr>

Have Bison grammars parsed by a Bison grammar.

In git this happens at revision e9955c83734d0a545d7822a1feb9c4a8038a62cb.

To bootstrap bison it's likely that one can just build an early version of it then the latest version of it, maybe an extra version in the middle if that fails. **Task: Find a working sequence of bison versions so you can build the latest without having bison to start of with..*

Note: Bison ships the generated files parse-gram.{c,h,output} along with the source code. These generated files have lookup table in them which are sort of impenetrable. The rest is very simple generated C code. Overall I think treating these files as source code is OK, it wouldn't be possible to hide a virus in them. On the other hand the lexer has much bigger lookup tables, maybe it could be possible to hide maliciously swapping one symbol for another inside the lexer lookup tables, so for this reason I think bootstrapping the lexer is worth doing.