Monitoring Kubernetes Cluster with Prometheus

What is Prometheus?

Prometheus is an open-source system monitoring and alerting framework. Prometheus collects and stores monitoring metrics with the timestamp at which it was recorded.

Prometheus Architecture Diagram

High Level Architecture
Main Components

Prometheus Server – scrapes and stores time series data
Alert Manager – handle alerts
Exporter – special-purpose exporters for services like MySQL, ngnix etc.

Prerequisites:
  • A Kubernetes cluster(For information on how to deploy a GKE cluster, see this post.)
  • kubectl client library to connect to Kubernetes Cluster
  • Admin privileges on Kubernetes Cluster
Connect to GKE cluster
Shell
Create a Namespace for Prometheus Deployment

By issuing following command, we will create a namespace called “monitoring ” for creating Prometheus deployment, services, pods. If we don’t specify a namesapce by default resources will be created in “defult” namespace.

Shell

Create a ClusterRole for Prometheus to Access Kubernetes API groups.

Create a file named clusterRole.yaml and copy the following yaml code into the file.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: default
  namespace: monitoring

Create the Kubernetes RBAC role using the following command.

Shell
Create a configmap with Prometheus Configuration

Prometheus configuration file is “prometheus.yaml” and rules file is “prometheus.rules”. we will add these 2 files through this configmap and mount them on Prometheus server pods.

Copy following code into a file called “prometheus-config.yaml” file.

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-server-conf
  namespace: monitoring
  labels:
    name: prometheus-server-conf
data:
  prometheus.rules: |-
    groups:
    - name: Infrastructure alerts
      rules:
      - alert: High Pod Memory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: page
        annotations:
          summary: High Memory Usage
  prometheus.yml: |-
    global:
      scrape_interval: 5s
      evaluation_interval: 5s
    rule_files:
      - /etc/prometheus/prometheus.rules
    alerting:
      alertmanagers:
      - scheme: http
        static_configs:
        - targets:
          - "alertmanager.monitoring.svc.cluster.local:9093"
    scrape_configs:
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'node-exporter'
          action: keep
      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https
      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics     
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name
      
      - job_name: 'kube-state-metrics'
        static_configs:
          - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
      - job_name: 'kubernetes-cadvisor'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
      
      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name
      
      - job_name: kubernetes-nodes-cadvisor
        scrape_interval: 10s
        scrape_timeout: 10s
        scheme: https  # remove if you want to scrape metrics on insecure port
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
          - role: node
        relabel_configs:
          - action: labelmap
            regex: __meta_kubernetes_node_label_(.+)
          - target_label: __address__
            replacement: kubernetes.default.svc:443
          - source_labels: [__meta_kubernetes_node_name]
            regex: (.+)
            target_label: __metrics_path__
            replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
        metric_relabel_configs:
          - action: replace
            source_labels: [id]
            regex: '^/machine\.slice/machine-rkt\\x2d([^\\]+)\\.+/([^/]+)\.service$'
            target_label: rkt_container_name
            replacement: '${2}-${1}'
          - action: replace
            source_labels: [id]
            regex: '^/system\.slice/(.+)\.service$'
            target_label: systemd_service_name
            replacement: '${1}'

Run following command to create Kubernetes configmap

Shell
Create a Prometheus Deployment

add below yaml code to “prometheus-deployment.yaml” file.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-deployment
  namespace: monitoring
  labels:
    app: prometheus-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-server
  template:
    metadata:
      labels:
        app: prometheus-server
    spec:
      containers:
        - name: prometheus
          image: prom/prometheus
          args:
            - "--config.file=/etc/prometheus/prometheus.yml"
            - "--storage.tsdb.path=/prometheus/"
          ports:
            - containerPort: 9090
          volumeMounts:
            - name: prometheus-config-volume
              mountPath: /etc/prometheus/
            - name: prometheus-storage-volume
              mountPath: /prometheus/
      volumes:
        - name: prometheus-config-volume
          configMap:
            defaultMode: 420
            name: prometheus-server-conf
  
        - name: prometheus-storage-volume
          emptyDir: {}

Run below command to create Prometheus deployment

Plain Text

By using below commands you can check deployments and runnings pods of Prometheus.

Shell
Connecting to Prometheus Console

with the help of kubectl port-forwarding, we can directly connect to a pod on a specific port from our workstation to container port. Run following commands to connect to a pod through browser.

Shell

from browser you can access Prometheus console by using url: http://localhost:8080 

Creating Prometheus Service to connect to Web Console using GCP External Load Balancer

Copy the following content into a file called prometheus-service.yaml.

apiVersion: v1
kind: Service
metadata:
  name: prometheus-service-external
  namespace: monitoring
spec:
  selector: 
    app: prometheus-server
  type: LoadBalancer  
  ports:
    - port: 8080
      targetPort: 9090 
      nodePort: 30000
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-service-internal
  namespace: monitoring  
spec:
  selector: 
    app: prometheus-server
  ports:
    - port: 8080
      targetPort: 9090 

When we apply above yaml file using below command it will create two services, prometheus-service-internal service for allowing other services to consume time series data on same cluster like Grafana, prometheus-service-external service for allowing users to access prometheus web UI using GCP external load balancer.

Shell

After couple of seconds you can go to GKE console -> ‘Services & Ingress’ for External load balancer IP to access Prometheus console.

Once you click on the endpoint it will take you to the Prometheus Web console.

Note: prometheus-service-internal.monitoring.svc.cluster.local is fqdn for internal service communication inside the cluster. For example if you have Grafana hosted on same cluster and need to configure datasource with internal service then you need to use http://prometheus-service-internal.monitoring.svc.cluster.local:8080 as Prometheus endpoint URL.

Conclusion

In this quick start demo we have configured Prometheus server on a GKE cluster and connected to it. You can find more information about Prometheus in official documentation.

For more on Prometheus Monitoring:

Alertmanager Setup on Kubernetes for Prometheus Monitoring

Prometheus Node Exporter Setup on Kubernetes

Grafana Setup for Prometheus Server on Kubernetes

Leave a Reply

%d