tl;dr Premature optimization in the rootfs of my emu

Last week:

There is of course a bunch of cleanup to do, and some serious “what happened to all my storage space?” work

This week I’ve been playing with reducing the amount of storage used, by rewriting shell scripts in the initramfs as C programs. Nothing in the initramfs is accessible in the second stage environment, so once the system is booted it’s just dead weight. Gien that a statically linked minimal busybox is around 200k, if we can get rid of it, that’s a decent sized chunk of a 16MB flash.

It turns out that not only can we get rid of busybox, we can get rid of the entire C library. preinit.c uses the Nolibc minimal C-library replacement - a header file defining inline functions that implement common syscalls - instead of linking against Musl, and this results in an initramfs approximately 4k in size. So I’m pretty happy about that.

I’m less happy about having had to insert inline MIPS assembly at the top of main() to get it to work, mostly because of the strong possiblity I’ve got it wrong…

    asm("la $gp, _gp\nsw $gp,16($sp)");

Some slightly handwavey I-dont-fully-understand-this context here: $gp is the “global pointer” register on MIPS, which is used to make references to global variables use one instruction instead of two (this Nintendo 64 programming blog is the clearest explanation I’ve found). Why do we need this magic?

  • the nolibc definition of __start doesn’t include anything to set up $gp (compare with the equivalent musl code

  • gcc is (sometimes) generating code to restore $gp from the stack after a function call returns - but nothing at function entry to save it on the stack in the first place. Which is odd, because it does save the register when I link against the regular C library.

  • no combination of -G or -mgpopt flags to gcc seemed to change its behaviour here.

So first we load $gp from the _gp symbol using the la pseudoinstruction, and then we stick it into the offset on the stack where the compiler will expect to find it about twelve instructions later. If we disassemble the file we can see this in action: first the instructions we added, then it sets up the registers for a call to write (using gp-relative addresses), then when write returns it reloads $gp from an offset from $sp.

004000f0 <main>:
[...]
  40011c:       3c1c0042        lui     gp,0x42        # we added this
  400120:       279c8970        addiu   gp,gp,-30352   # and this
  400124:       afbc0010        sw      gp,16(sp)      # and this
  400128:       8f838018        lw      v1,-32744(gp)
  40012c:       00003025        move    a2,zero
  400130:       00661021        addu    v0,v1,a2
  400134:       80420000        lb      v0,0(v0)
  400138:       14400040        bnez    v0,40023c <main+0x14c>
  40013c:       8f90801c        lw      s0,-32740(gp)
  400140:       8f858018        lw      a1,-32744(gp)
  400144:       26100620        addiu   s0,s0,1568
  400148:       0200c825        move    t9,s0
  40014c:       04110134        bal     400620 <write>
  400150:       24040001        li      a0,1
  400154:       8fbc0010        lw      gp,16(sp)      # gcc wrote this
  400158:       00003825        move    a3,zero

It was a fun and entertaining voyage of learning new things, but also one for the unwritten Risk Register as I’m not happy I fully understand it. I suppose at least if it does fail then the mode of failure will be a quite obvious “doesn’t boot”.

Removing the shell from the initramfs also meant having to rewrite the activate script. This script lives in the actual root filesystem but is run in the initramfs context (no nix store, no shared libraries, minimal /dev, etc, and now no shell interpreter). For now this is just linked statically against musl and weighs in at around 70k, but really the same argument for using nolibc would apply just as much here - even though it’s not part of the initramfs proper, that’s still 70k that’s not shared with anything else and we’ll never use again after boot.

I have to be honest and admit that there are probably other parts of the system where I could make savings bigger than 200k. At least I hope there are - the jffs2 compression doesn’t seem to be nearly as effective as squashfs, so we’re paying quite a price for a writable filesystem. But there was still a certain amount of pride in making it almost-cost-neutral to add an initramfs.

Next week: adding multi-output derivations in the overlay. Looking at the generated filesystem I see that verious packages contain man pages, static libraries, random python files etc which are probably not necessary to the running of the system and could save us kB or even MB.