Some assembly required
tl;dr Premature optimization in the rootfs of my emu
Last week:
There is of course a bunch of cleanup to do, and some serious “what happened to all my storage space?” work
This week I’ve been playing with reducing the amount of storage used, by rewriting shell scripts in the initramfs as C programs. Nothing in the initramfs is accessible in the second stage environment, so once the system is booted it’s just dead weight. Gien that a statically linked minimal busybox is around 200k, if we can get rid of it, that’s a decent sized chunk of a 16MB flash.
It turns out that not only can we get rid of busybox, we can get rid of the entire C library. preinit.c uses the Nolibc minimal C-library replacement - a header file defining inline functions that implement common syscalls - instead of linking against Musl, and this results in an initramfs approximately 4k in size. So I’m pretty happy about that.
I’m less happy about having had to insert inline MIPS assembly
at the top of main()
to get it to work, mostly because of the strong
possiblity I’ve got it wrong…
asm("la $gp, _gp\nsw $gp,16($sp)");
Some slightly handwavey I-dont-fully-understand-this context here:
$gp
is the “global pointer” register on MIPS, which is used to
make references to global variables use one instruction instead of
two (this Nintendo 64 programming blog is the clearest explanation I’ve found).
Why do we need this magic?
-
the nolibc definition of
__start
doesn’t include anything to set up$gp
(compare with the equivalent musl code -
gcc is (sometimes) generating code to restore
$gp
from the stack after a function call returns - but nothing at function entry to save it on the stack in the first place. Which is odd, because it does save the register when I link against the regular C library. -
no combination of
-G
or-mgpopt
flags to gcc seemed to change its behaviour here.
So first we load $gp from the _gp
symbol using the la
pseudoinstruction, and then we stick it into the offset on the stack
where the compiler will expect to find it about twelve instructions
later. If we disassemble the file we can see this in action: first the
instructions we added, then it sets up the registers for a call to
write
(using gp-relative addresses), then when write
returns it
reloads $gp
from an offset from $sp
.
004000f0 <main>:
[...]
40011c: 3c1c0042 lui gp,0x42 # we added this
400120: 279c8970 addiu gp,gp,-30352 # and this
400124: afbc0010 sw gp,16(sp) # and this
400128: 8f838018 lw v1,-32744(gp)
40012c: 00003025 move a2,zero
400130: 00661021 addu v0,v1,a2
400134: 80420000 lb v0,0(v0)
400138: 14400040 bnez v0,40023c <main+0x14c>
40013c: 8f90801c lw s0,-32740(gp)
400140: 8f858018 lw a1,-32744(gp)
400144: 26100620 addiu s0,s0,1568
400148: 0200c825 move t9,s0
40014c: 04110134 bal 400620 <write>
400150: 24040001 li a0,1
400154: 8fbc0010 lw gp,16(sp) # gcc wrote this
400158: 00003825 move a3,zero
It was a fun and entertaining voyage of learning new things, but also one for the unwritten Risk Register as I’m not happy I fully understand it. I suppose at least if it does fail then the mode of failure will be a quite obvious “doesn’t boot”.
Removing the shell from the initramfs also meant having to rewrite the
activate
script. This script lives in the actual root filesystem but
is run in the initramfs context (no nix store, no shared libraries,
minimal /dev
, etc, and now no shell interpreter). For now this is
just linked statically against musl
and weighs in at around 70k, but really the same argument for using
nolibc would apply just as much here - even though it’s not part of
the initramfs proper, that’s still 70k that’s not shared with anything
else and we’ll never use again after boot.
I have to be honest and admit that there are probably other parts of the system where I could make savings bigger than 200k. At least I hope there are - the jffs2 compression doesn’t seem to be nearly as effective as squashfs, so we’re paying quite a price for a writable filesystem. But there was still a certain amount of pride in making it almost-cost-neutral to add an initramfs.
Next week: adding multi-output derivations in the overlay. Looking at the generated filesystem I see that verious packages contain man pages, static libraries, random python files etc which are probably not necessary to the running of the system and could save us kB or even MB.