Helm stable/prometheus-operator: adding new scraping targets and troubleshooting

Today I Learned (TIL) that Kubernetes named ports usage may be quite frustrating.

Recently I’ve deployed monitoring stack (Prometheus, Grafana) in Kubernetes using Helm stable/prometheus-operator. Setting up monitoring for Kubernetes cluster itself is covered by this wonderful guide. But for setting up monitoring for other services one need to learn what Kubernetes Operator is, and create his/her own ServiceMonitor for Prometheus-Operator (see stable/prometheus-operator).

ServiceMotitor is a custom resource. It tells Prometheus what k8s Service exposes metrics and where: service label selectors, its namespace, path, port, etc. Say, we’ve got a web application Service in the default namespace with label app: pili that exposes /metrics endpoint on the named port called uwsgi . We are going to deploy ServiceMonitor for Prometheus to scrape the metrics every 15 seconds in monitoring namespace and with certain labels. Then we apply a manifest:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: monitoring-pili
  namespace: monitoring
  labels:
    app: pili-service-monitor
spec:
  selector:
    matchLabels:
      # Target app service
      app: pili
  endpoints:
  - interval: 15s
    path: /metrics
    port: uwsgi
  namespaceSelector:
    matchNames:
    - default

Following this steps I’m sucessfully deployed a few service monitors: for a PostgreSQL cluster, RabbitMQ, ElasticSearch, etc. All of them allowed Prometheus to scrape metrics just as expected. But my own application still showed 0 active targets in Prometheus. I could manually curl my app service’s /metrics endpoint and see that all the metrics are exposed correctly. Still Prometheus was unable to scrape it.

I doublechecked label selectors as they often happen to be a culrpit (see this StackOverflow question). Everything was fine. I could see that my app’s Service exposed Kubernetes endpoints correctly:

$ kubectl get endpoint --namespace default
...
pili  10.244.0.136:8080,10.244.0.144:8080,10.244.0.185:8080  6d16h
...

Eventually, it turned out that ServiceMonitor didn’t see my app service’s named port!

My Service manifest look like this:

apiVersion: v1
kind: Service
metadata:
  name: pili
  labels:
    app: pili
spec:
  type: ClusterIP
  ports:
  - protocol: TCP
    port: 8080
    targetPort: uwsgi
  selector:
    app: pili-web
    tier: backend

In that case Service targeted named port uwsgi from app’s Deployment. The port was also used in Ingress sucessfully:

...
- backend:
    serviceName: pili
    servicePort: uwsgi
...

It wasn’t untill I explicitly named the port (with the same name) in Service that ServiceMonitor could discover my target. So I rewrote my Service manifest:

apiVersion: v1
kind: Service
metadata:
  name: pili
  labels:
    app: pili
spec:
  type: ClusterIP
  ports:
  - name: uwsgi
    protocol: TCP
    port: 8080
    targetPort: uwsgi
  selector:
    app: pili-web
    tier: backend

So the only change that really helped was this one:

   ports:
-  - protocol: TCP
+  - name: uwsgi
+    protocol: TCP
     port: 8080
     targetPort: uwsgi

Service’s port should be explicitly named even if has the same name as target port’s name. ServiceMonitor understands only explicitly named Service ports, although other entities, e.g. an Ingress, work well without explicit naming. Beware!

P.S. At the time of writing I used kubectl v1.14.2 and helm v2.14.0.