Monitoring systemd user manager with node exporter

I use systemd and its timers to run various tasks on my machines on a regular basis, and since that's all happening in the background, I want to know if those jobs fail. Now, it's better to have a job-specific alert rule that checks that the state of the world is desirable (for example - instead of checking "did the e-mail backup job succeed?", check "is the most recent file in the e-mail backup destination less than 24 hours old?"), but it's nice to have the general "did the service succeed?" check as a backstop for when a specific alert is hard to set up, or when I just haven't gotten around to it yet.

Prometheus' node exporter has a suite of collectors to gather metrics on various aspects of your system - and among them is a collector to gather information on systemd! It's disabled by default, but you can easily enable it via --collector.systemd.

Now, if it were that simple, the topic wouldn't be worthy of a blog post 😅 One interesting thing about systemd is that not only does it run a system manager as PID 1 - it also runs a user manager for each user's login session. The user manager can run its own services, timers, etc - the advantage of the user manager is that you don't need to be root to start or stop things with it (you just need to use systemctl --user), and the "lifetimes" of those services and timers are bound to your user session.

However, there's a weakness with node exporter's systemd collector: it only gathers information on the system manager, so when I initially set this up, I was blind to any failures for any services run by the user manager!

You can use systemd_exporter to monitor either the system or the user manager - and that would be my recommendation - but I did manage to figure out a trick to use node exporter to monitor the user manager before I switched to using systemd_exporter myself, so I figured I'd share my discoveries in the hopes that others might find them interesting.

The first part of the trick is that you can use the --collector.systemd.private option to tell the collector to talk to systemd via the /run/systemd/private socket (rather than D-Bus, I suppose). This option isn't actually documented, so it could vanish at any time 😬

That socket still corresponds to the system manager, though, so the second part of the trick is convincing node exporter that /run/user/$UID/systemd/private is /run/systemd/private. We can do that by setting up a bind mount in the service's unit definition, via BindReadOnlyPaths=/run/user/%i/systemd/private:/run/systemd/private. Using a bind mount means we can't run this service using the user manager, which is why BindReadOnlyPaths uses %i (for the instance) instead of %U (for the UID) here, and why we need to name the service user-manager-monitor@.service. Here's the unit file:

[Unit]
Description=Prometheus exporter for systemd user manager metrics
# wait until user manager instance is up before we come up
After=user@%i.service
# ...and shut us down if the user manager goes down
BindsTo=user@%i.service

[Service]
User=%i
Group=%i
Restart=on-failure
# allow configuration by specifying USER_MANAGER_MONITOR_ARGS in /etc/conf.d/user-manager-monitor
EnvironmentFile=-/etc/conf.d/user-manager-monitor
ExecStart=/usr/bin/prometheus-node-exporter --collector.disable-defaults --collector.systemd --collector.systemd.private $USER_MANAGER_MONITOR_ARGS
ExecReload=/bin/kill -HUP $MAINPID
NoNewPrivileges=true
ProtectHome=read-only
ProtectSystem=strict
BindReadOnlyPaths=/run/user/%i/systemd/private:/run/systemd/private
RestartSec=10

[Install]
# start us up when the user manager comes up
WantedBy=user@%i.service

...and I set it up via sudo systemctl enable user-manager-monitor@$(id -u).service. You can also see that I threw in --collector.disable-defaults to disable the other collectors, since my regular node exporter service will provide those.

So, this gets us the metrics we need to alert on - but shortly after I set this up, I ran into a problem: if the user manager restarts (eg. if systemctl --user daemon-reexec gets run due to an update to the systemd package), it creates a new private socket, and the node exporter instance can no longer connect, because the bind mount we created still refers to the former private socket. The node exporter instance continues to serve up metrics, but not any systemd ones - at least, not until I fix its bind mount by restarting it! So I wrote an alert rule to notify me when this happens, and then I would restart the service.

This didn't happen a ton, but I did tire of the alerts and responding to them - can't systemd do this for me?

Well, yes! I found and read this article about systemd watchdogs and added some watchdog logic by adding WatchdogSec=60, and also using this script I wrote to send heartbeats to systemd as long as my alert rule isn't firing:

#!/bin/bash

set -e
set -o pipefail

# take in the target UID as the first argument
uid=$1
shift

# run node exporter in the background as the target UID
sudo -u "#$uid" -g "#$uid" /usr/bin/prometheus-node-exporter --collector.disable-defaults --collector.systemd --collector.systemd.private "$@" &
NODE_EXPORTER_PID=$!

# check if the alert rule is firing and let systemd know we're still OK as long as it's not
while true ; do
    if curl --fail -s http://localhost:9090/api/v1/alerts | jq -e '.data[] | map(select(.labels.alertname == "UserManagerExporterLostState")) | length == 0' >/dev/null ; then
        systemd-notify WATCHDOG=1
    fi
    sleep 15
done

wait $NODE_EXPORTER_PID

I had to drop the User=%i and Group=%i parts of the systemd unit, since systemd-notify needs root permissions to work, and I of course needed to update ExecStart to call my wrapper script.

Anyway, if you want to monitor the user manager, I would again suggest using systemd_exporter instead, but hopefully you found this pile of hacks I used to get node exporter to do the job interesting!

Published on 2024-04-07