- The goal is to remove the dependency on precompiled binaries, or at least build everything in terms of some small core (Something a human could audit).
- Each package has a set of inputs required to build it. (for example many projects require 'make' and 'gcc' and binutils 'ld').
- If one of these inputs (directly or indirectly) is the output of building the project itself, then the package is self-hosted. (For example gcc is usually built using gcc)
In a set of packages, if there is only one self hosted package then you can trace the build process of any other package back down to that core.
- Identify self hosted packages.
- Break their loop by creating a new set of inputs able to build them that is "smaller" and not self-hosted.
- Identify and create useful packages that are built in terms of smaller inputs (e.g. stage0, mes, guile, amber)
This can be done in parallel, there is little need to coordinate a specific bootstrap path. It will simply exist if each package can be built in terms of smaller parts. The big problem is that most of the essential smaller parts are so useful, they are repeatedly used all over the place and basic cooperation is recommended to avoid massive duplication of effort that such work could entail.
Note that the problem for gcc is not completely solved just by compiling gcc with tcc. To build gcc (with tcc) you need to run ./configure which depends on tr, diff, mktemp etc. All of which need to built using a C compiler on a Unix. So at some point we need to build these tools without ./configure and without a working Unix.
About the dependency graph
There is a great page that has an analysis of debians dependency graph, sorted into strongly connected components:
Any directed graph can be organized into a DAG (directed acyclic graph) of SCCs (strongly connected components). A DAG of packages can be built in sequence but an SCC will contain loops. The aim is to break all loops. Knowing the SCCs is valuable because it shows you where work needs to be done.