Filtering out nan values with Prometheus
When writing PromQL, I very rarely run into nan, but when I do, it can be kind of a pain to deal with, since it:
- ...is infectious (ie. if you're doing an aggregation like
sum()and it hits anan, the value for the whole group in the aggregation becomesnan, which is often not what you want. - ...doesn't equal itself, so doing things like
sum(http_request_duration_seconds_sum / http_request_duration_seconds_count != nan) by(instance)doesn't work.
One simple trick to filter out nan is to just compare to inf:
sum(http_request_duration_seconds_sum / http_request_duration_seconds_count <= inf) by(instance)Since nan is not equal to anything - including infinity - this effectively filters them out!
However, you might be better suited figuring out why you have nan values in the first place - one way this can happen is if you're dividing a *_sum from a summary or histogram by its corresponding *_count metric, and the count metric's value is zero. If that's the case, you can just filter the zero values out before you divide (just make sure you get the operator precedence right!):
sum(http_request_duration_seconds_sum / (http_request_duration_seconds_count > 0)) by(instance)In my case, though, I was using prometheus_rule_evaluation_duration_seconds, which is apparently initialized with nan values.