Stage0

From bootstrapping
Jump to: navigation, search

Purpose[edit]

Bootstrapping is hard, like insanely hard. So hard in fact that everyone who has ever done it, never wants to do it again. The problem of course is that all of the technical infrastructure we have today depends upon binaries that we can't actually trust since there is no way to reproduce them from trusted sources, since we have no absolutely trusted sources.

Stage0 is aimed at making those absolutely trusted sources easier, like less than 400 hours of work total easier.

Design[edit]

Stage0 starts with only 2 things:

1) A trusted binary that implements the VM Spec [1]

2) A sub 300byte hex monitor [How you create it is up to you; I like toggling it in manually myself]

from that starting point, I have provided in easy to audit form (direct mapping between Hex, its effective assembly and a C implementation) a series of hex utilities that are required for basic development work. These files are in the stage1 folder.

What is with the weird file extensions?[edit]

File extensions are very important in stage0, they directly indicate the level of infrastructure required to build them.

* HEX0 - indicates that the file can be built using the stage0 hex monitor or any other tool that supports the minimal commented hex syntax
* HEX1 - indicates that the file also requires support for 1 character labels and 16bit relative displacements.
* HEX2 - indicates that the file also requires support for long labels, 16bit absolute displacements and 32bit pointers for manual object creation.
* S - indicates that the file can be built using the M0 macro assembler

hex0[edit]

Hex0 is trivial to implement [2] It just needs to read 2 hex nybbles and output a byte, you can ignore all non-hex characters but you need to support 2 types of line comments{# and ;}

; This is a line comment
# So it is
;; And this
## And this
;; but to be polite please don't mix in non-hex characters in the hex stream,
## it doesn't make you clever, it just makes your code harder to read

# Done
48 c7 c7 00 00 00 00 # mov $0x0,%rdi
48 c7 c0 3c 00 00 00 # mov $0x3c,%rax
0f 05                # syscall

Example of .hex code from hex0.hex This maps out an ELF file for linux which implements a compiler for hex (!).

hex2[edit]

(hex1 is a simpler version of this, where labels are limited to 1 char long and only 16bit relative addressing. It is used to build hex2) hex2 extends that language with labels and pointers.

  • @ - 16 bit relative address
  • $ - 16 bit absolute address
  • & - 32 bit absolute address (for pointers)
# ;; Set p->Next = p->Next->Next->Next
18020000	# LOAD32 R0 R2 0 ; Get Next->Next->Next
23010000	# STORE32 R0 R1 0 ; Set Next = Next->Next->Next
:Identify_Macros_1
18010000	# LOAD32 R0 R1 0 ; Get node->next
A0300000	# CMPSKIPI.NE R0 0 ; If node->next is NULL
3C00 @Identify_Macros_Done	# JUMP @Identify_Macros_Done ; Be done
# ;; Otherwise keep looping
3C00 @Identify_Macros_0	# JUMP @Identify_Macros_0
:Identify_Macros_Done
# ;; Restore registers
0902803F	# POPR R3 R15
0902802F	# POPR R2 R15
0902801F	# POPR R1 R15
0902800F	# POPR R0 R15
0D01001F	# RET R15
:Identify_Macros_string
444546494E450000	# "DEFINE"

Example of .hex2 code from M0-macro.hex2

line macros[edit]

The M0 macro assembler is implemented in .hex2 [3] Such that using a defs file like this:

DEFINE LOADR 2E0
DEFINE LOADR8 2E1
DEFINE LOADRU8 2E2

you can now program with the mnemonics instead of raw hexadecimal codes. This creates a new ".s" assembly language which looks like this:

# We still support these comments
;; We also added support for hex inserts like so
:My_Global
'00440044'
;; And we also support strings, that we null pad to 4byte boundaries to make disassembly easier.
:My_String
"Hello world!"

:Prompt_Loop
	LOADXU8 R0 R3 R4            ; Get a char
	CMPSKIPI.NE R0 0            ; If NULL
	JUMP @Prompt_Done           ; We reached the end
	FPUTC                       ; Write it to TTY
	ADDUI R3 R3 1               ; Move to next char
	JUMP @Prompt_Loop           ; And loop again

and supports all of the syntax support of Hex2 to allow sample taken from CAT.s

Variations[edit]

The most common variation is to extend hex2 with additional functionality, such as extending the standard set to include

  •  ! - 8 bit relative address (short jumps for 8086 or small immediate values)
  • @ - 16 bit relative address (ironically not really used in x86)
  • $ - 16 bit absolute address (rare use in x86)
  •  % - 32 bit relative address (long jumps for x86)
  • & - 32 bit absolute address (for pointers)

more exotic mixes may replace hex with octal (for x86 but not AMD64) because it is a better match for the underlying opcode space.

Common mistakes[edit]

Trying to bootstrap a bigger language than M0/M1 assembly tends to devolve into a growing cycle of more and more work with little return. Simply bootstrap the stage0 VM and be done with that madness.