tags: k8s helm prometheus kubernetes-operator til
Today I Learned (TIL) that Kubernetes named ports usage may be quite frustrating.
Recently I’ve deployed monitoring stack (Prometheus, Grafana) in Kubernetes using Helm stable/prometheus-operator. Setting up monitoring for Kubernetes cluster itself is covered by this wonderful guide. But for setting up monitoring for other services one need to learn what Kubernetes Operator is, and create his/her own ServiceMonitor for Prometheus-Operator (see stable/prometheus-operator).
ServiceMotitor is a custom resource. It tells Prometheus what k8s Service exposes metrics and where: service label selectors, its namespace, path, port, etc. Say, we’ve got a web application Service in the default namespace with label app: pili that exposes /metrics endpoint on the named port called uwsgi . We are going to deploy ServiceMonitor for Prometheus to scrape the metrics every 15 seconds in monitoring namespace and with certain labels. Then we apply a manifest:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: monitoring-pili namespace: monitoring labels: app: pili-service-monitor spec: selector: matchLabels: # Target app service app: pili endpoints: - interval: 15s path: /metrics port: uwsgi namespaceSelector: matchNames: - default
Following this steps I’m sucessfully deployed a few service monitors: for a PostgreSQL cluster, RabbitMQ, ElasticSearch, etc. All of them allowed Prometheus to scrape metrics just as expected. But my own application still showed 0 active targets in Prometheus. I could manually curl my app service’s /metrics endpoint and see that all the metrics are exposed correctly. Still Prometheus was unable to scrape it.
I doublechecked label selectors as they often happen to be a culrpit (see this StackOverflow question). Everything was fine. I could see that my app’s Service exposed Kubernetes endpoints correctly:
$ kubectl get endpoint --namespace default ... pili 10.244.0.136:8080,10.244.0.144:8080,10.244.0.185:8080 6d16h ...
Eventually, it turned out that ServiceMonitor didn’t see my app service’s named port!
My Service manifest look like this:
apiVersion: v1 kind: Service metadata: name: pili labels: app: pili spec: type: ClusterIP ports: - protocol: TCP port: 8080 targetPort: uwsgi selector: app: pili-web tier: backend
In that case Service targeted named port uwsgi from app’s Deployment. The port was also used in Ingress sucessfully:
... - backend: serviceName: pili servicePort: uwsgi ...
It wasn’t untill I explicitly named the port (with the same name) in Service that ServiceMonitor could discover my target. So I rewrote my Service manifest:
apiVersion: v1 kind: Service metadata: name: pili labels: app: pili spec: type: ClusterIP ports: - name: uwsgi protocol: TCP port: 8080 targetPort: uwsgi selector: app: pili-web tier: backend
So the only change that really helped was this one:
ports: - - protocol: TCP + - name: uwsgi + protocol: TCP port: 8080 targetPort: uwsgi
Service’s port should be explicitly named even if has the same name as target port’s name. ServiceMonitor understands only explicitly named Service ports, although other entities, e.g. an Ingress, work well without explicit naming. Beware!
P.S. At the time of writing I used kubectl v1.14.2 and helm v2.14.0.