Main Page

Welcome to !
This wiki is about bootstrapping. Building up compilers and interpreters and tools from nothing.

"'Recipe for yogurt: Add yogurt to milk.' - Anon."

short sci fi story Coding Machines by Lawrence Kesteloot, January 2009

Current Topics

 * mes by janneke, mes
 * stage0 by Jeremiah Orians stage0
 * Coquillage by bms_
 * Descent principle
 * Self-Extension
 * Self-Hosting
 * Build Systems
 * C compilers
 * Boostrapping Specific Languages
 * discarded options and why

Past Research / intray
important: try to summarize lessons learned from each.


 * Pascal-S by Wirth (Small, self-contained subset w/ great error reporting)
 * Compiler Construction by Wirth (Oberon-0 language in book is well-suited to bootstrapping)
 * Edison by Hansen (Language w/ 5 statements & small OS on PDP-11)
 * Project Oberon by Wirth et al (Simple language, compiler, OS, and RISC CPU w/ source laid out like a book.)
 * maru by Ian Piumarta
 * COLA whitepaper by Ian Piumarta
 * PreScheme using an low level s-exp IL to implement scheme.
 * Incremental, Scheme Compiler by Ghuloum (Build Scheme-to-ASM compiler in "24, small steps;" Githubs available)
 * Red Language by Rakocevic et al (LISP-like power/DSL's, can do low-level, batteries included, 1MB standalone)
 * MinCaml by IPA (Efficient compiler for minimal, functional language in 2000 lines & 14-week segments)
 * LCC by Hanson and Fraser (A 20Kloc compiler w/ book describing its workings; literate code; non-FOSS, but free non-commercial)
 * Axiomatic Bootstrapping: A Guide for Compiler Hackers by Andrew Appel (bootstrapping SML)
 * Merlin: Just Add Reflection (bootstrapping object oriented merlin)
 * booting BCPL (bootstrapping BCPL using intcode)
 * High-level Assembly by Hyde (Assembly w/ high-level data types, control flow & a stdlib; use/check just what you need)
 * Linoleum by Ghignola (Cross-platform, lean, fast, assembly-like language)
 * wingolog about the guile compiler (all brilliant posts!)
 * Partcl by Zaitsev (Tiny TCL; TCL's parse & interpret easily; also references Picol etc)
 * neatld linker by ali grudi (and also neatas neatcc)
 * SchemeRepo by Univ. of Indiana (Pile of source for Scheme lexers, parsers, comilers, etc.)
 * https://www.youtube.com/watch?v=Sk9TatW9ino Tutorial: Building the Simplest Possible Linux System - Rob Landley
 * Om Language by sparist (Prefix, typeless language with three operators; concatenative like Forth)
 * by Laurence Tratt
 * SBCL: a Sanely-Bootstrappable Common Lisp by Christophe Rhodes
 * prescheme to c compiler - https://github.com/nineties-retro/sps
 * Ur-Scheme by Kragen Sitaker
 * qhasm by Daniel Bernstein (portable form of Assembly language that standardizes machine instruction syntax across CPUs)

Karger-Thompson Attack
Anything related to the karger thompson attack: proof of concept demos, mitigations, theory.


 * multics the original paper explaining the attack (before thompson!)
 * SCM Security by Wheeler (Secure distribution & compilation of source fundamentals; Karger advised mastering it)
 * rotten by rntz (thompson attack demo)
 * rust infection by manishearth (thompson attack demo in the rust compiler)
 * tcc ACSAC by daved wheeler
 * CompCert by Leroy et al (Mathematically-verified, C compiler whose specs and proofs checked with tiny, verified checker)
 * CakeML by Myreen et al (Mathematically-verified, SML compiler whose specs and proofs checked with different, tiny, verified checker)
 * VLISP by Oliva and Wand (Article has links to VLISP which mathematically verified PreScheme and Scheme48)
 * KCC by Rosu et al (Executable, formal semantics for C in rewrite logic; could do that w/ simpler engine)
 * TALC by Cornell (Typed, assembly language to verify safety w/out compiler; checker can be simple; C subset + verified compiler to TALC)
 * CoqASM by Microsoft Research (Bootstrap in verifiably-safe assembly in prover checked by tiny, verified checker)

Ubiquitous Implementations
These are tools written in ubiquitous languages, therefore they can be used in a wide variety of contexts.


 * shasm by Hohensee (x86 assembler written in BASH)
 * AWKLisp by Bacon (LISP written in Awk; includes Perl version from Perl Avenger)
 * Gherkin by Dipert (LISP written in Bash)
 * mal "make a lisp" implementing a very basic lisp interpreter in hundreds of languages

Small C Compilers

 * c4 by rswier (incredibly short c compiler)
 * cc500 by edmund grimley-evans (tiny c compiler)
 * CUCU by Zaitsev (Small, C compiler designed for easy understanding)
 * SmallerC by Frunze (Small, single-pass, C compiler for several ISA's)
 * picoc interpreter.
 * C Interpreter by Dr Dobbs (Describes building a C interpreter with source)
 * Small C for I386 (IA-32)
 * Selfie, a tiny self-compiling compiler for a subset of C, a tiny self-executing MIPS emulator, and a tiny self-hosting MIPS hypervisor, all in a single 7kLoC file. HN discussion. Paper.

Grammars, Parsing, and Term Rewriting

 * Grammar Executing Machine by McKeeman and He (Incrementally extend languages from simple to complex grammars in interpreter(s))
 * peg by kragen (parsing)
 * PEG-based simple compiler by Ian Piumarta
 * META II by Bayfront Tech (Original meta-compiler w/ live code and detailed tutorial; OMeta was successor)
 * META II implementation by Lugon (Looks like a small implementation of META II; also bootstrapped in META II)
 * OMeta# Intro by Moser (OMeta intro that nicely illustrates the meta approach/advantages)

Virtual Machines, Instruction Sets

 * P-code by Wirth (High-level language & libraries target ultra-simple, portable interpreter)
 * sweet16 by Steve Wozniak
 * Tiny BASIC by Allison (Small BASIC whose original VM took 120, virtual opcodes to implement using 3KB RAM)
 * Klip by Cutting (Compiler & runtime for simple language for students; done in C#; runtime is very readable)

CPU's for Bootstrapping: The Simple, The Verified, and The Necessarily Complex

 * NAND2Tetris by Nisan and Schocken (Guide that teaches hardware step-by-step in fun way with simple CPU emerging)
 * J1 by by Bowman (16-bit Forth CPU in 200 lines of Verilog that does 100MIPS on FPGA's;
 * H2 by Howe (Modified, VHDL version of J1 with detailed description and Howe's code MIT-licensed
 * RISC-0 by Wirth (Simple, RISC CPU & SOC designed for Oberon language with detailed docs and source online)
 * JOP by Shoeberl et al (Embedded Java processor that takes up 1830 slices on FPGA)
 * Scheme Machine by Burger (Scheme interpreter implemented as CPU using formal methods)
 * ZPU by Zylin AS (Tiny, 32-bit CPU for deep embedded apps in 440 LUT's)
 * J2 by Landley et al (Clone of cost-efficient, SuperH-2 CPU in open-source)
 * VAMP by Beyer et al (Formally-verified, DLX-style processor in 18,000 slices on Xilinx)
 * Leon3 by Gaisler (Industry-grade, 32-bit SPARC w/ auto-configuration of core and GPL license)
 * Rocket by Univ of CA (1.4GHz RISC-V CPU and generator for customization)
 * OpenPITON by Princeton (25-core, shared-memory, SPARC CPU open-sourced and very scalable)

Helpful Links

 * The first self hosted lisp
 * lambda-the-ultimate thread asking for info on bootstrapping
 * awesome-compilers github list with a lot of information (copy the relevant parts to this wiki)
 * Tombstone diagram
 * bootstrappable a community hub for bootstrapping, with mailing list.
 * bootstrappable mailing list
 * 
 * ELF visualization
 *