Arms length

Since the last update I have added UBIFS for the Belkin RT3200 (and other devices with larger flash chips), and started an Arm 32 bit port which runs on QEMU and boots (though doesn’t yet do much after booting) on the Turris Omnia. But then I got a little bit sidetracked into improving (read: replacing) the story for upgrading a router once it has Liminix running. (Blatting a new image onto the flash while the system is running from that same flash device is not a good idea. The best case scenario is that the flash write succeeds and then the system wedges solid, but I can’t see any guarantee it wouldn’t crash earlier)

Why is this important? (Why is this important now?) For squashfs systems (which can’t be written to at all after imaging), and jffs2 (which can be written to somewhat, but with caveats) we’d rather not take the case apart and stick serial wires on it every time we re-image.

For UBIFS systems we usually have more space to play with, so we can expect to be able to do nixpkgs major version updates in place - but if the end-user does want to reinstall for any reason then we’d like to preserve the erase counters, which a simple flashcp (the moral equivalent of dd but for flash chips) won’t do.

We do already have something that should address this: the kexecboot output creates an image almost just like the running system, but with the fillip that instead of booting with u-boot and running from flash, it boots from a running Linux system and runs from an emulated flash device that’s actually just a contiguous section of physical RAM. But …

it only works for systems where the filesystem image is significantly smaller than the available RAM
as I found when I tried to write a CI test for it, kexec can be precarious
it’s also a tediously manual process.

So I had a new idea. Instead of rebooting into the new system in order to write the new system to flash, we can switch root of a running system to use a ram-based filesystem, and then it will be safe to do anything we want to the flash.

we need to stop all processes that have open files. Given that this might include processes that map the text pages of their own executables, it seems easiest and most prudent to stop all processes that might have open files, which is to say - all processes.
we most likely need some network services to run in the ram-based system because the likelihood is that we’ll be doing the upgrade over the network. Which services depends on the local network configuration and the system’s rôle in it. For example, your home router has a static IP address and usually runs a DHCP server, so in maintenance mode it probably wants to keep the address but stop the DHCP and any routing - conversely, a standalone wireless AP might usually get its address with a DHCP request so it probably should do the same in maintenance mode. So it’s up to the system builder to define which services they need.
s6 makes half of this easy: the s6-svscan process (our pid 1) runs a script before it reboots, which we can hook to check for the existence of the maintenance system and switch root to it then re-exec init to start the maintenance services.
Nix makes the other half easy: we populate the maintenance system chroot simply by (1) copying /nix/store paths for the closure of the needed services, and then (2) running activate to create the base files (/dev /bin /etc etc) from a pseudofiles attrset.

I’ve called it levitate because it allows you to hover above the filesystem without setting foot on it. The essence of it is you just add the package to your profile

  defaultProfile.packages =
    with pkgs; [
	  tcpdump
	  strace
	  (levitate.override { services = {
	    inherit (config.services) dhcpc sshd;
	   }; }
	  )
    ];

then you can login and run levitate to populate the maintenance mode filesystem and reboot to exec into it. The UI is still a bit WIP and there’s no documentation yet, but I’m quite pleased with it so far.