Log off

I missed doing a blog update in September, but I did make a video demonstrating using an external source for secrets management. With a new better microphone, too, so the audio is less awful.

But now it’s October, and this update is all about getting logs off your device(s) and into some other system where you could index them or search them or do alerts or whatever else based on them. There were a few things I had in mind as goals for this work

it shouldn’t rely on a particular downstream service - so it’s not coupled to Graylog or ELK or Prometheus or even rsyslog
I’m not going to rewrite all the applications that Liminix devices use to make them produce “structured” logs: as Avery Pennarun said: “the debate about “events” vs “logs” was kind of moot. We didn’t control all the parts in our system, so telling us to forget logs and use only structured events doesn’t help. udhcpd produces messages the way it wants to produce messages, and that’s life. Sometimes the kernel panics and prints whatever it wants to print, and that’s life. Move on.”
for the purposes of interactive diagnostics, I want to see the logs arrive on the log collator machine in more-or-less realtime instead of waiting for 60 seconds or 4MB or something
if the log collator service is unreachable, everything else should carry on working - i.e. the device will continue to write to /run/log/current (Side note: yes, I’ve changed the log location from /run/uncaught-logs to /run/log)

So how does it work? There are two parts:

If you set logging.shipping.enable = true, we insert a call to logtap in the pipeline that runs the default logger. This is a small program (with, I now notice, really weird indentation?) which copies its input to its output and also to a unix socket but only if there’s something listening on that socket.

Then if you set logging.shipping.service to a longrun of some kind, that service is run with its standard input connected to the said socket, and can do anything it likes to get the logs it’s receiving off the box and sent somewhere else.

So here’s the example I’m working on:

  logging.shipping = {
    enable = true;
    service = longrun {
      name = "ship-logs";
      dependencies = [ config.services.client-cert ];
      run =
        let path = lib.makeBinPath (with pkgs; [ s6-networking s6 ]);
        in ''
          PATH=${path}:$PATH \
          CAFILE=${/var/lib/certifix/certs/ca.crt} \
          KEYFILE=$(output_path ${services.client-cert} key) \
          CERTFILE=$(output_path ${services.client-cert} cert) \
          s6-tlsclient -k loghost.example.net -h -y loghost.example.net 19612 \
          fdmove -c 1 7 cat
        '';
    };
  };

This connects to a service on loghost which accepts TLS connections and pipes them to s6-log which spits them out to /var/log/remote

  systemd.services."s6-log-collector" = {
    after = [ "network.target" ];
    wantedBy = [ "multi-user.target" ];
    serviceConfig = {
      Type = "exec";
      WorkingDirectory = "/var/log";
      ExecStart = (pkgs.writeScript "start" ''
        #!${pkgs.runtimeShell}
        ${pkgs.socat}/bin/socat openssl-listen:19612,reuseaddr,fork,cert=/var/lib/certifix/certs/server.crt,key=/var/lib/certifix/private/server.key,cafile=/var/lib/certifix/certs/ca.crt  stdout | ${pkgs.s6}/bin/s6-log -b /var/log/remote
      '');
    };
  };

Tada!

Oh, one more thing: what’s this “Certifix”? I said there were two parts? I’d better explain the third part. The issue is I want to accept logs from my devices but i don’t to accept logs from the entire internet, and as I trust zero I wish to accomplish this using some kind of authentication rather than by using firewall rules.

Certifix is a small network service which accepts ~~SSL~~ TLS certificate requests (CSR) and generates signed certificates, but only if the CSR contains a magic word (which is baked into the Liminix image), and not if it doesn’t. Here’s the bit where I call it on the device:

  services.client-cert = svc.tls-certificate.certifix-client.build {
    caCertificate = builtins.readFile /var/lib/certifix/certs/ca.crt;
    subject = "C=GB,ST=London,O=Telent,OU=devices,CN=${config.hostname}";
    secret = builtins.readFile ../challengePassword;
    serviceUrl = "https://loghost.example.net:19613/sign";
  };

and now I have a private key (which is generated on the device itself, btw) and a certificate for the device, which it can use to authenticate to any service elsewhere that can check it against the CA cert.

How done is this?

Not entirely, but it’s getting there

it works in my Qemu VM, but
I haven’t installed on any real device yet, and
it doesn’t deal with kernel messages,
nor does it “backfill” messages to the remote service that were produced before the shipper started

And you should read the Avery Pennarun article if you haven’t seen it before.