The Turris Omnia has been running “the internet” for my family for most of a month (uptime says 27 days) and after some initial teething trouble I have had no complaints.

However, it doesn’t reboot cleanly, because there isn’t a cable plugged into each port of the lan switch. Does that sound weird? This is why:

To add an interface to a bridge, the interface must be in “running” condition. For a wireless interface that means hostapd is configured and working; for a wired interface it means that link carrier is present. To detect this, the bridge member service runs

ifwait $dev running && \
 ip link set dev $dev master $(output ${primary} ifname)`

but - it’s a oneshot not a longrun, which means that s6-rc doesn’t regard the service as “up” until the script has finished, which means (because this service is part of the default target of services to bring up on boot) that without plugging network cables in everywere we will never reach the point where s6-rc considers the system successfully booted.

[ Previous discussion of ifwait and its earlier iteration waitup ]

Clearly a different approach is required. It’s not an error if the device operator unplugs a network cable any more than it’s an error if they add a USB stick, or if they click the toggle button on the side of the device. Perhaps it’s not even an error (at least from the service manager’s point of view; it’s probably undesirable for the users) if the internet connection goes down, if there’s a backup LTE modem that should be activated in that scenario.

You can see where this is going. The tl;dr is that the services that require these conditions should run only under the said conditions.

If you’re thinking “udev” here, that’s part of the picture. But there is a small caveat and a big caveat:

  • not all scenarios are detectable with udev. Maybe we want the backup LTE modem to be brought up when the metrics on the primary WAN link are below some threshold, udev isn’t going to help there

  • launching services in response to events is fundamentally the wrong model. A service is required when a particular state of affairs obtains, not when an event has happened. An event may be a signal that the state has changed in some way - indeed it is much preferable to be notified of state changes instead of having to poll for them - but in philosophical terms the service is coupled to the state not directly to the signal.

That’s the thinking, anyway. The plan, which is tentative and has not yet experienced any real contact with the enemy, is to create a new class of services called “triggers”, which accept (1) some expression matching a particular arrangement of state, and (2) a service which should be running when that expression is true (and stopped when it is not). The state could be contents of sysfs, or outputs of some other service, or maybe even some kind of metrics (from SNMP or a time series database or whatever), and the state source will probably drive the syntax of the expression. For example we could have something like this

triggers.sysfs.build {
  match = {
    SUBSYSTEM="net";
    ID_PATH="pci-0000:04:00.0";
    ATTR.operstate = "up";
  };

  service = oneshot {
    up = "ip link set dev $dev master $(output ${primary} ifname)";
    down = "ip link set dev $(output ${member} ifname) nomaster";
  }
}

or this

 watcher.build {
  watching = services.wan;
  match =  {
    # an expression matching the outputs of the service
    # to be watched. this syntax is merely handwaving,
    # don't read anything into it
    tx.dropped.60 = 100;
  };
  service = oneshot {
    run = "start_lte_blah";
  };
}

Where we are today is a single baby step along this road to find out whether s6-rc is amenable to having its services flapped like this: on the ifwait-test branch we have redesigned the bridge member services so that instead of blocking in the up script they are longruns and the run script invokes s6-rc -u change ... when the interface is running/not running.

    svc.ifwait.build {
      state = "running";
      interface = member;
      dependencies = [ primary member ];
      service = oneshot {
        name = "${primary.name}.member.${member.name}";
        up = ''
          ip link set dev $(output ${member} ifname) master $(output ${primary} ifname)
        '';
        down = "ip link set dev $(output ${member} ifname) nomaster";
      };
    };

It works in CI, and it appears to work on a test device. Next step is to roll it out to some real users.