Save time by focusing on the result - using Prometheus' textfile collector to avoid writing an exporter

My friend Frew, as well as others, have said that our job as developers is not to write code, but rather to solve problems. I had a win in this regard relatively recently where I saved myself a good chunk of time by leveraging a Prometheus component for a purpose it may not have been designed for, but is remarkably well suited to.

A few months ago, I had a problem with my server - dhcpcd would get into a weird state roughly every 2 weeks and lose its IPv4 address, which is not great for a server that is hosting your personal site! I turned on debug logging for dhcpcd (of which there is a lot - I ended up putting the logs in their own journald namespace to enforce a separate disk quota!), and I noticed that dhcpcd started spitting out some strange errors roughly 24 hours before the address got lost.

I have monitoring in place to detect when my server is down, but it's no fun being under the gun investigating what's going on with dhcpcd while your site is down. I also use Prometheus to monitor various things, so to give myself some breathing room and time to investigate, I figured I could write a Prometheus exporter (Prometheus' term for a program that collects metrics for a Prometheus instance) for journald to export metrics about entry counts for (namespace, unit, priority) tuples, and then preemptively alert when (dhcpcd, dhcpd@eth0, ERROR)'s rate goes bananas.

Now, writing a Prometheus exporter isn't super hard, but it is work - and while a journald exporter might not be a bad tool to have in the future, I wasn't 100% confident I would need something so comprehensive. Was there a way I could get these logging metrics into my Prometheus without having to do the work of writing an exporter? Yes - through a very useful feature in Prometheus' node exporter: the textfile collector!

Node exporter is an exporter you can run that reports various metrics about the system it's running on: CPU usage, memory usage, process count, disk usage, etc. It does this through a series of collectors - you tell which collectors you're interested in when you start up node exporter. It will invoke the ones you asked for when it's scraped and serve up their metrics.

One of these collectors, as I mentioned, is the textfile collector. All it does is look for *.prom files in a directory of your choosing and serves up the metrics it parses out of them. This is ostensibly to cover things like custom system metrics that aren't available in node-exporter's considerable cadre of collectors, or things that would be difficult to add.

Fortunately for us, it also happens to work very well for trying out new metrics very quickly! You could write a cronjob to dump metrics out to a file - the format is very simple - and have some metrics to play with to decide whether or not you should invest more time on them in the future.

Instead of spending my time writing a journald exporter, I spent 10 minutes crafting a Perl program to dump this data into a file for node exporter to pick up. And I haven't felt the need for a journald exporter since - this just goes to show that if you focus on solving the problem rather than trying to throw code at it, you can save yourself a lot of time!

Of course, as with all things in software, there are tradeoffs - if you know for sure you will need such an exporter in the future, obviously it's cheaper to just write the exporter. So use your best judgement!

Published on 2020-09-07