Simple explanation: bootstrapping is about building a compiler using tools smaller than itself, as opposed to building a compiler using an already built version of itself. The problem with the second is: Where did that prebuilt binary come from?
This discusses methods of long term software preservation. Briefly about hardware that will not degrade over time, but the majority of the paper is about how to design a software stack that can be executed in the far future. In order to achieve this they recommend build everything in terms of a machine with a short simple specification.
In depth literate programming describing a complete implementation of forth. Bootstrapped from intel 32 bit assembly with lots of assembler macros into a fully self extensible forth. This is a really illuminating read, teaching a lot of details about forth as well as showing just how minimal a runtime it is possible to make a programming language with.
These slides outline the developement of rowl and amber. This is a programming language bootstrapped up from assembly. rowl is implemented directly in assembly then parts of the amber vm and compiler are implemented in rowl, then the rest of amber is implemented by self hosting.
A designed-to-be-safe statement-oriented programming language that bootstraps up from x86 machine code, using just a handful of Linux syscalls (no libc). Implemented in 60k lines of a notation for x86 machine code, 40k of which are automated tests. Safety checks for the compiler are still in progress.
This project builds a SICP-style, Scheme interpreter with a REPL in Go. The blog post describes each phase. They're simple-looking. The Github integrates it into a total of 240 lines of code. Being a simple language, the Go implementation could be ported to anything else in our collection or straight hand-assemblied. Then, more complex stuff built on it like nineties or other LISPers do.
A big concern in dealing with trust in hardware is whether it's subverted or not. Intel, AMD, and many other big names have backdoors in their chips for management purposes. Among other things... ;) One cheat to get trustworthy image is to just use a computer you have no reason to believe is subverted. Acquire it under a boring buyer, it itself is a boring tech, do your bootstrapping thing in it air gapped, and use what it produces. It will likely *not* be subverted *by default* since the interdictors and TAO folks have limited resources w/ no reason to target the system. Use several that are different for best results. To help with that, I (Nick P.) put together a list of all kinds of CPU's and execution strategies on Schneier's blog. Something I left off the list are old TI-82 calculators, Palm Pilots, etc. Lots of old stuff lying around you can get in person with cash that is probably unsubverted.
"It's time for the Go compilers to be written in Go, not in C. I'll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code". They wrote the compiler in C then translated the source code from C into Go almost automatically (had to do some manual fixing up). This is an interesting approach. Let's name it the transpile approach to self hosting.
asmutils a linux distro/userland implemented in assembly
This is a linux distribution implemented entirely in assembly. It doesn't depend on libc or anything.
This is a teaching document that explains how to make an assembler in forth! It shows a very forth-idiomatic style of programming, and how easy it is to make an advanced assembler once you have a working forth.
This is a rust compiler written in C++, it translates rust to C. it makes the normal self hosted rustc compiler bootstrappable! It neglects the borrow checker but is still able to compile valid input source correctly.
CakeML is really really fascinating. They have created a theory of SML programs inside HOL, allowing them to prove properties of SML programs embedded inside HOL. They have created a (serious) compiler from SML down to assembly and proved that it preserves semantics all the way. They are then able to compile the compile simultaneously bootstrapping the proof to create a verified compiler binary for which it is proven that it compiles input programs and preserves their semantics. To my knowledge this is the first such development.
This is an incredibly well developed bootstrapping project. hex assembler. elf maker. x86 assembler. linker. B compiler. C compiler. Includes implementations of various POSIX style libc functions along the way. It is extremely well written and worth studying!
The asmc project is a small bootable kernel that loads up a payload which. payloads exist for assembly compilers and "G language" compilers. The G language is a low level lang below C which was invented to ease bootstrapping. An assembler (which can build the kernel) has been implemented in G.
cmeta - Using ideas from META compiler compiler Pim builds the meta language up from raw hex. blc - binary lambda calculus implementation, capable of computing matt mights factorial program. built using the cmeta system. Incredibly terse. Surprising that the techniques of metacompiler compilers can be applied at such a low level. The amount of leverage may be highest in this project.
"It compiles and runs a subset of the Revised Pascal language. That subset was designed to be the minimum language required to self compile for a new machine implementation. It was part of a "bootstrapping" kit designed to facilitate porting Pascal to new machines.". The pascal language was implemented with bootstrapping intention in mind. They have a simple "p code" bytecode language that eases the process.
This is a forth operating system with emacs like editor and lisp interpreter built in it. It's a 1700 line assembly script for the bootable forth compiler/interpreter and then the whole rest of the system is implemented in forth. I have not tried but apparently it can build itself with the assembler. This is very impressive work.
Past Research / intray
important: try to summarize lessons learned from each.
Pascal-S by Wirth (Small, self-contained subset w/ great error reporting)