Aesop/Notes and Research

From bootstrapping

Random notes about bootstrapping Pascal with Aesop, An Extensive Subset Of Pascal.

Compiler backends[edit]

Conclusion: QBE Assembly is the best choice

  • C: Apparently compiling to C loses you some flexibility
  • GCC: Far too bloated, apparently not nice to work with, hard dependency on C++.
  • LLVM: Far too bloated, hard dependency on C++.
  • Cranelift: Very immature and buggy, supports even fewer architectures than QBE.
  • QBE: Supports fewer architectures, but that's not a problem thanks to QEMU. Only dependency is a C compiler. Update: It has come to my attention that, due to its youth, QBE does not support some important features such as inline asm.
  • Assembly: The only choice after eliminating all the others. Unfortunately.

Language[edit]

Conclusion: Probably C.

  • C: Bootstrapping it is annoying, but there's plenty of solid tools like Yacc and Lex for it. It also doesn't have a complexity problem like many of the others do.
  • Go: I don't know it very well.
  • Rust: Bootstrapping it is annoying, it suffers from complexity syndrome, and the ecosystem has an annoying tendency to pull in dozens of dependencies with each crate.
  • D: Bootstrapping it is annoying, and it has too many features for its own good.
  • Scheme: No static typing :(
  • Lisp: No static typing :(
  • Nim: The whole point of Aesop is to reach Nim in the first place!
  • OCaml: Older version is bootstrapped; not all libraries will work with this version.
  • Idris: Requires Haskell.
  • Haskell: Attempting to bootstrap it is more scarring than programming in COBOL.
  • One of the stage0 languages: Ha ha, you're very funny.

Possible alternative solutions to the FPC problem[edit]

Conclusion: diverse double-compilation with a new compiler (Aesop) is the best solution

  • Compiling it with GNU Pascal: Nobody could figure out how to make it compile with modern toolchains, and they follow completely different standards: FPC follows (and was written using) Turbo Pascal and Delphi features, whereas GNU Pascal followed the ISO Extended Pascal standard. However, GNU Pascal has significant support for the Borland Pascal(Turbo Pascal 7) dialect. This should be enough to compile FreePascal 1.0.10, and to start the bootstrap chain.
  • Using an older version and doing a chain: We'd need to emulate it in DOSBox or FreeDOS-in-QEMU or something prior to Unix-like support added in Freepascal 1.0.10. The first versions were compiled by the proprietary Turbo Pascal compiler, in response to Borland dropping support for DOS. Those first versions probably STILL used Turbo Pascal extensions like the preprocessor and units. GNU Pascal supports Turbo Pascal extensions, but not the new features implemented and used in recent Freepascal versions.
  • Using one of the Pascal to C tools: They have the same problem as GNU Pascal.
  • Extending someone else's interpreter or compiler: Interpreters aren't viable, since I'd need to write an assembler and JIT for inline asm blocks. I couldn't find a toy Pascal compiler.

A brief venture into the FPC source code, or: How non-standard is FPC exactly?[edit]

We'll clone the git repo at [1], then cd into it. The directory layout looks reasonable enough:

compiler         fpmake.pp         installer  Makefile      nohup.out  README.md  tests
fpmake_add1.inc  fpmake_proc1.inc  LICENSE    Makefile.fpc  packages   rtl        utils

The only worrying thing there is that 'fpmake.pp' file. Looks like a custom build system... written in Pascal :( Since we probably don't need to worry about the other directories, let's cd into compiler/. I'm sure this directory layout will be fi-

aarch64         cstreams.pas     mips          objcdef.pas             pinline.pas      README.txt
aasmbase.pas    cutils.pas       MPWMake       objcgutl.pas            pkgutil.pas      rescmn.pas
aasmcfi.pas     dbgbase.pas      msg           objcutil.pas            pmodules.pas     rgbase.pas
aasmcnst.pas    dbgcodeview.pas  msgidx.inc    ogbase.pas              powerpc          rgobj.pas
aasmdata.pas    dbgdwarf.pas     msgtxt.inc    ogcoff.pas              powerpc64        riscv
aasmdef.pas     dbgstabs.pas     nadd.pas      ogelf.pas               pparautl.pas     riscv32
aasmsym.pas     dbgstabx.pas     nbas.pas      oglx.pas                ppc68k.lpi       riscv64
aasmtai.pas     defcmp.pas       ncal.pas      ogmacho.pas             ppc8086.lpi      scandir.pas
aggas.pas       defutil.pas      ncgadd.pas    ogmap.pas               ppcaarch64.lpi   scanner.pas
aoptbase.pas    dirparse.pas     ncgbas.pas    ognlm.pas               ppcarm.lpi       sparc
aoptda.pas      dwarfbase.pas    ncgcal.pas    ogomf.pas               ppcavr.lpi       sparc64
aoptobj.pas     elfbase.pas      ncgcnv.pas    ogrel.pas               ppcgen           sparcgen
aopt.pas        entfile.pas      ncgcon.pas    ogwasm.pas              ppcjvm.lpi       switches.pas
aoptutils.pas   export.pas       ncgflw.pas    omfbase.pas             ppcmips64el.lpi  symbase.pas
arm             expunix.pas      ncghlmat.pas  optbase.pas             ppcmipsel.lpi    symconst.pas
armgen          finput.pas       ncginl.pas    optconstprop.pas        ppcmips.lpi      symcreat.pas
assemble.pas    fmodule.pas      ncgld.pas     optcse.pas              ppcppc64le.lpi   symdef.pas
avr             fpcdefs.inc      ncgmat.pas    optdead.pas             ppcppc64.lpi     symsym.pas
blockutl.pas    fpchash.pas      ncgmem.pas    optdeadstore.pas        ppcppc.lpi       symtable.pas
browcol.pas     fpcp.pas         ncgnstfl.pas  optdfa.pas              ppcriscv32.lpi   symtype.pas
catch.pas       fpkg.pas         ncgnstld.pas  options.pas             ppcriscv64.lpi   symutil.pas
ccharset.pas    fppu.pas         ncgnstmm.pas  optloadmodifystore.pas  ppcsparc64.lpi   syscinfo.pas
cclasses.pas    gendef.pas       ncgobjc.pas   optloop.pas             ppcsparc.lpi     systems
cepiktimer.pas  generic          ncgopt.pas    opttail.pas             ppcwasm32.lpi    systems.inc
cfidwarf.pas    globals.pas      ncgrtti.pas   optutils.pas            ppcx64llvm.lpi   systems.pas
cfileutl.pas    globstat.pas     ncgset.pas    optvirt.pas             ppcx64.lpi       tgobj.pas
cg64f32.pas     globtype.pas     ncgutil.pas   owar.pas                ppcxtensa.lpi    tokens.pas
cgbase.pas      hlcg2ll.pas      ncgvmt.pas    owbase.pas              ppcz80.lpi       triplet.pas
cgexcept.pas    hlcgobj.pas      ncnv.pas      owomflib.pas            ppheap.pas       utils
cghlcpu.pas     html             ncon.pas      parabase.pas            pp.lpi           verbose.pas
cgobj.pas       htypechk.pas     nflw.pas      paramgr.pas             pp.pas           version.pas
cgutils.pas     i386             ngenutil.pas  parser.pas              ppu.pas          wasm32
cmsgs.pas       i8086            ngtcon.pas    pass_1.pas              procdefutil.pas  wasmbase.pas
comphook.pas    impdef.pas       ninl.pas      pass_2.pas              procinfo.pas     widestr.pas
compiler.pas    import.pas       nld.pas       pbase.pas               psabiehpi.pas    wpobase.pas
compinnr.pas    jvm              nmat.pas      pcp.pas                 pstatmnt.pas     wpoinfo.pas
comprsrc.pas    ldscript.pas     nmem.pas      pdecl.pas               psub.pas         wpo.pas
comptty.pas     link.pas         nobjc.pas     pdecobj.pas             psystem.pas      x86
constexp.pas    llvm             nobj.pas      pdecsub.pas             ptconst.pas      x86_64
COPYING.txt     m68k             node.pas      pdecvar.pas             ptype.pas        xtensa
cprofile.pas    macho.pas        nopt.pas      pexports.pas            raatt.pas        z80
crefs.pas       machoutils.pas   nset.pas      pexpr.pas               rabase.pas
cresstr.pas     Makefile         nutils.pas    pgentype.pas            rasm.pas
cscript.pas     Makefile.fpc     objcasm.pas   pgenutil.pas            rautils.pas

...oh. Let's see approximately how non-standard this code is. First thing; in every file we have:

unit (...);
...
interface
...
implementation
...

What's this, then? Hmm... Apparently this is Pascal's module system. Except even that isn't standard. So, we'd need a compiler/interpreter with at least a module system. GNU Pascal has one... but it's the completely different and incompatible Extended Pascal module system. I can also see loads of things that look like this, and nvim highlights them differently to comments:

{$i (...)}

These are, apparently, preprocessor directives. Of course, these aren't standardized either; they're a Turbo Pascal extension if I remember right. Let's have a look at, say, llvm/aasmllvm.pas now. As soon as you open it up, you can see a type declaration... and after the equals sign:

class(tai_cpu_abstract_sym)
    ...
end

The FPC codebase uses advanced Delphi object-oriented features extensively. Of course it does. I don't think there's any hope of using another compiler at this point, so let's just stop. (There's probably many other more subtle portability problems in there, like non-standard built-in types, but I'm not experienced enough with Pascal to tell.)