Main Page

Welcome to the bootstrapping wiki!
This wiki is about bootstrapping, i.e., building up compilers and interpreters and tools from nothing.

"'Recipe for yogurt: Add yogurt to milk.' - Anon."

short sci fi story Coding Machines by Lawrence Kesteloot, January 2009

Also see http://bootstrappable.org, which has pointers to a mailing list and IRC channel.

Simple explanation: bootstrapping is about building a compiler using tools smaller than itself, as opposed to building a compiler using an already built version of itself. The problem with the second is: Where did that prebuilt binary come from?

Current Topics

 * mes by janneke, mes
 * stage0 by Jeremiah Orians stage0
 * Coquillage by bms_
 * Descent principle
 * The Semantics Assignment Problem
 * Self-Extension
 * Self-Hosting
 * Build Systems
 * Build Inputs
 * C compilers
 * Below C Level
 * Boostrapping Specific Languages
 * discarded options and why
 * Investigate
 * Projects List
 * Documents
 * Forth

Past Research / intray
important: try to summarize lessons learned from each.


 * Pascal-S by Wirth (Small, self-contained subset w/ great error reporting)
 * Compiler Construction by Wirth (Oberon-0 language in book is well-suited to bootstrapping)
 * Edison by Hansen (Language w/ 5 statements & small OS on PDP-11)
 * Project Oberon by Wirth et al (Simple language, compiler, OS, and RISC CPU w/ source laid out like a book.)
 * ML/I and Sal by Tannenbaum (Macro system bootstrapping low-level language, Sal, they built an OS with)
 * COLA whitepaper by Ian Piumarta
 * PreScheme using an low level s-exp IL to implement scheme.
 * Incremental, Scheme Compiler by Ghuloum (Build Scheme-to-ASM compiler in "24, small steps;" Githubs available)
 * Red Language by Rakocevic et al (LISP-like power/DSL's, can do low-level, batteries included, 1MB standalone)
 * MinCaml by IPA (Efficient compiler for minimal, functional language in 2000 lines & 14-week segments)
 * Spry by Krampe (Combines traits of LISP, Rebol, Smalltalk, and Forth; hosted on Nim; 2300loc)
 * LCC by Hanson and Fraser (A 20Kloc compiler w/ book describing its workings; literate code; non-FOSS, but free non-commercial)
 * Axiomatic Bootstrapping: A Guide for Compiler Hackers by Andrew Appel (bootstrapping SML)
 * Merlin: Just Add Reflection (bootstrapping object oriented merlin)
 * booting BCPL (bootstrapping BCPL using intcode)
 * High-level Assembly by Hyde (Assembly w/ high-level data types, control flow & a stdlib; use/check just what you need)
 * Linoleum by Ghignola (Cross-platform, lean, fast, assembly-like language)
 * wingolog about the guile compiler (all brilliant posts!)
 * Partcl by Zaitsev (Tiny TCL; TCL's parse & interpret easily; also references Picol etc)
 * neatld linker by ali grudi (and also neatas neatcc)
 * SchemeRepo by Univ. of Indiana (Pile of source for Scheme lexers, parsers, comilers, etc.)
 * https://www.youtube.com/watch?v=Sk9TatW9ino Tutorial: Building the Simplest Possible Linux System - Rob Landley
 * Om Language by sparist (Prefix, typeless language with three operators; concatenative like Forth)
 * by Laurence Tratt
 * SBCL: a Sanely-Bootstrappable Common Lisp by Christophe Rhodes
 * prescheme to c compiler - https://github.com/nineties-retro/sps
 * Ur-Scheme by Kragen Sitaker
 * qhasm by Daniel Bernstein (portable form of Assembly language that standardizes machine instruction syntax across CPUs)
 * debian rebootstrap a project with the idea that bootstrapping debian should be a repeatable process, not a hacky one off thing
 * http://t3x.org/t3x/ - minimal procedural language with self hosted tiny compiler
 * - bootstrapping a linux system from source
 * bootstrapping trust in compilers blog post by Owl's portfolio
 * programming thought experiment kragen comment on reddit
 * scheme from scratch
 * http://interim-os.com/
 * https://github.com/m4tx/uefi-jitfuck UEFI JIT brainfuck
 * https://miyuki.github.io/2017/10/04/gcc-archaeology-1.html gcc archaeology
 * https://github.com/murisi/L2
 * https://tinygo.org/faq/why-a-new-compiler/
 * https://github.com/siraben/meta-II

Karger-Thompson Attack
Anything related to the karger thompson attack: proof of concept demos, mitigations, theory.


 * multics the original paper explaining the attack (before thompson!)
 * SCM Security by Wheeler (Secure distribution & compilation of source fundamentals; Karger advised mastering it)
 * rotten by rntz (thompson attack demo)
 * rust infection by manishearth (thompson attack demo in the rust compiler)
 * tcc ACSAC by daved wheeler
 * CompCert by Leroy et al (Mathematically-verified, C compiler whose specs and proofs checked with tiny, verified checker)
 * CakeML by Myreen et al (Mathematically-verified, SML compiler whose specs and proofs checked with different, tiny, verified checker)
 * VLISP by Oliva and Wand (Article has links to VLISP which mathematically verified PreScheme and Scheme48)
 * KCC by Rosu et al (Executable, formal semantics for C in rewrite logic; could do that w/ simpler engine)
 * TALC by Cornell (Typed, assembly language to verify safety w/out compiler; checker can be simple; C subset + verified compiler to TALC)
 * CoqASM by Microsoft Research (Bootstrap in verifiably-safe assembly in prover checked by tiny, verified checker)

Ubiquitous Implementations
These are tools written in ubiquitous languages, therefore they can be used in a wide variety of contexts.


 * shasm by Hohensee (x86 assembler written in BASH)
 * AWKLisp by Bacon (LISP written in Awk; includes Perl version from Perl Avenger)
 * Gherkin by Dipert (LISP written in Bash)
 * BASH Infinity by Brzoska (BASH framework/routines that might help write compilers in it)
 * mal "make a lisp" implementing a very basic lisp interpreter in hundreds of languages
 * A new bootstrapping project that is built up to a self host language above assembly from a minimal DOS platform.

Small C Compilers

 * c4 by rswier (incredibly short c compiler)
 * cc500 by edmund grimley-evans (tiny c compiler)
 * CUCU by Zaitsev (Small, C compiler designed for easy understanding)
 * SmallerC by Frunze (Small, single-pass, C compiler for several ISA's)
 * picoc interpreter.
 * C Interpreter by Dr Dobbs (Describes building a C interpreter with source)
 * Small C for I386 (IA-32)
 * Selfie, a tiny self-compiling compiler for a subset of C, a tiny self-executing MIPS emulator, and a tiny self-hosting MIPS hypervisor, all in a single 7kLoC file. HN discussion. Paper.
 * Tiny C expression compiler Written in Forth based on tinyc.c by marc feeley.
 * C compilers by Rui Ueyama blog
 * 10 hour self hosting c compiler

Grammars, Parsing, and Term Rewriting

 * Grammar Executing Machine by McKeeman and He (Incrementally extend languages from simple to complex grammars in interpreter(s))
 * peg by kragen (parsing)
 * PEG-based simple compiler by Ian Piumarta
 * META II by Bayfront Tech (Original meta-compiler w/ live code and detailed tutorial; OMeta was successor)
 * META II implementation by Lugon (Looks like a small implementation of META II; also bootstrapped in META II)
 * OMeta# Intro by Moser (OMeta intro that nicely illustrates the meta approach/advantages)

Virtual Machines, Instruction Sets

 * P-code by Wirth (High-level language & libraries target ultra-simple, portable interpreter)
 * sweet16 by Steve Wozniak
 * Tiny BASIC by Allison (Small BASIC whose original VM took 120, virtual opcodes to implement using 3KB RAM)
 * Klip by Cutting (Compiler & runtime for simple language for students; done in C#; runtime is very readable)

CPU's for Bootstrapping: The Simple, The Verified, and The Necessarily Complex

 * NAND2Tetris by Nisan and Schocken (Guide that teaches hardware step-by-step in fun way with simple CPU emerging)
 * J1 by by Bowman (16-bit Forth CPU in 200 lines of Verilog that does 100MIPS on FPGA's)
 * H2 by Howe (Modified, VHDL version of J1 with detailed description and Howe's code MIT-licensed)
 * RISC-0 by Wirth (Simple, RISC CPU & SOC designed for Oberon language with detailed docs and source online)
 * JOP by Shoeberl et al (Embedded Java processor that takes up 1830 slices on FPGA)
 * Scheme Machine by Burger (Scheme interpreter implemented as CPU using formal methods)
 * ZPU by Zylin AS (Tiny, 32-bit CPU for deep embedded apps in 440 LUT's)
 * J2 by Landley et al (Clone of cost-efficient, SuperH-2 CPU in open-source)
 * VAMP by Beyer et al (Formally-verified, DLX-style processor in 18,000 slices on Xilinx)
 * Leon3 by Gaisler (Industry-grade, 32-bit SPARC w/ auto-configuration of core and GPL license)
 * Rocket by Univ of CA (1.4GHz RISC-V CPU and generator for customization)
 * OpenPITON by Princeton (25-core, shared-memory, SPARC CPU open-sourced and very scalable)

Minimal Operating Systems

 * KolibriOS - lightweight assembly OS.
 * MikeOS - same.
 * Sortix - modern reimplementation of POSIX in C. (Note: No perl port and GCC does not build natively on it. (yet.))
 * ASMLINUX - linux kernel, but the userspace is implemented entirely in assembly.
 * LFS - Guide on building Linux and the GNU userspace.
 * buildroot
 * NetBSD build.sh - Cross-build a complete NetBSD ISO from a foreign OS. There's also a guide in the official NetBSD docs.
 * lh-bootstrap - alternative linux distro, using musl instead of glibc.
 * xv6 - UNIX teaching OS MIT
 * OS/161 - UNIX teaching OS Harvard
 * https://landley.net/toybox/about.html - Toybox, alternative to Busybox by Robert Landley, see also Aboriginal Linux and mkroot by the same author, which are all geared toward a minimal boostrappable system
 * https://github.com/pikhq/bootstrap-linux - Another take at a bootstrappable Linux system
 * Project Oberon - Operating system and programming language Oberon (from the creator of Pascal - Niklaus Wirth)
 * TinyGo Operating system in programming language Go
 * TamaGo - Another operating system in programming language Go
 * seL4 Microkernel - OS kernel in less than 10.000 lines of source code and less than 150 KB binary

Biology/Other?

 * https://ds9a.nl/amazing-dna/#bootstrapping - DNA seen through the eyes of a coder

Helpful Links

 * AIM-039.pdf The first self hosted lisp
 * lambda-the-ultimate thread asking for info on bootstrapping
 * awesome-compilers github list with a lot of information (copy the relevant parts to this wiki)
 * Tombstone diagram
 * bootstrappable a community hub for bootstrapping, with mailing list.
 * bootstrappable mailing list
 * yabfc - Generating-executable-files-from-scratch
 * ELF visualization
 * Cfront - converts C++ to C; developed by Bjarne Stroustrup.
 * https://sourceware.org/glibc/wiki/FAQ#How_do_I_install_all_of_the_GNU_C_Library_project_libraries_that_I_just_built.3F
 * Formal Compiler Verification with ACL2 - proving a compiler correct with ACL2 and discussion about correctness and self compiling.