Helm stable/prometheus-operator: adding new scraping targets and troubleshooting
tags: k8s helm prometheus kubernetes-operator til
Today I Learned (TIL) that Kubernetes named ports usage may be quite frustrating.
Recently I’ve deployed monitoring stack (Prometheus, Grafana) in Kubernetes using Helm stable/prometheus-operator. Setting up monitoring for Kubernetes cluster itself is covered by this wonderful guide. But for setting up monitoring for other services one need to learn what Kubernetes Operator is, and create his/her own ServiceMonitor for Prometheus-Operator (see stable/prometheus-operator).
ServiceMotitor is a custom resource. It tells Prometheus what k8s Service exposes metrics and where: service label selectors, its namespace, path, port, etc. Say, we’ve got a web application Service in the default namespace with label app: pili that exposes /metrics endpoint on the named port called uwsgi . We are going to deploy ServiceMonitor for Prometheus to scrape the metrics every 15 seconds in monitoring namespace and with certain labels. Then we apply a manifest:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: monitoring-pili
namespace: monitoring
labels:
app: pili-service-monitor
spec:
selector:
matchLabels:
# Target app service
app: pili
endpoints:
- interval: 15s
path: /metrics
port: uwsgi
namespaceSelector:
matchNames:
- default
Following this steps I’m sucessfully deployed a few service monitors: for a PostgreSQL cluster, RabbitMQ, ElasticSearch, etc. All of them allowed Prometheus to scrape metrics just as expected. But my own application still showed 0 active targets in Prometheus. I could manually curl my app service’s /metrics endpoint and see that all the metrics are exposed correctly. Still Prometheus was unable to scrape it.
I doublechecked label selectors as they often happen to be a culrpit (see this StackOverflow question). Everything was fine. I could see that my app’s Service exposed Kubernetes endpoints correctly:
$ kubectl get endpoint --namespace default ... pili 10.244.0.136:8080,10.244.0.144:8080,10.244.0.185:8080 6d16h ...
Eventually, it turned out that ServiceMonitor didn’t see my app service’s named port!
My Service manifest look like this:
apiVersion: v1
kind: Service
metadata:
name: pili
labels:
app: pili
spec:
type: ClusterIP
ports:
- protocol: TCP
port: 8080
targetPort: uwsgi
selector:
app: pili-web
tier: backend
In that case Service targeted named port uwsgi from app’s Deployment. The port was also used in Ingress sucessfully:
... - backend: serviceName: pili servicePort: uwsgi ...
It wasn’t untill I explicitly named the port (with the same name) in Service that ServiceMonitor could discover my target. So I rewrote my Service manifest:
apiVersion: v1
kind: Service
metadata:
name: pili
labels:
app: pili
spec:
type: ClusterIP
ports:
- name: uwsgi
protocol: TCP
port: 8080
targetPort: uwsgi
selector:
app: pili-web
tier: backend
So the only change that really helped was this one:
ports:
- - protocol: TCP
+ - name: uwsgi
+ protocol: TCP
port: 8080
targetPort: uwsgi
Service’s port should be explicitly named even if has the same name as target port’s name. ServiceMonitor understands only explicitly named Service ports, although other entities, e.g. an Ingress, work well without explicit naming. Beware!
P.S. At the time of writing I used kubectl v1.14.2 and helm v2.14.0.