This post is the next episode of previous post (http://dhanuka84.blogspot.com/2020/04/learning-kubernetes-by-example-kafka.html) .
In this blog post we are mainly focus on how to trigger alerts with Prometheus and how kafka-operator react based on those alerts. Finally how Cruise-Control rebalance the kafka cluster based on the same alert.
Please note that, since this post is some kind of simple version of two references mentioned in the bottom, I have hijacked the diagram from same article :).
Key Points:
- Once we provisioned manifests (kafkacluster-prometheus.yaml) for key Prometheus components, Prometheus operator will create kafka-alerts, kafka-prometheus and kafka-operator-alertmanager .
- When you provision kafka cluster (Envoykafkacluster.yaml), monitoringConfig will deploy Prometheus jmx java agents in brokers. So Prometheus can monitor kafka cluster.
- Based on the metrics and relevant rules that has been configured, Prometheus will generate alerts to Kafka-Operator. Then Kafka-Operator will create kafka-broker pods (according to below test case).
- Kafka-Operator will propagate same alert to Cruise-Control, then Cruise-Control will do cluster re-balancing on Kafka brokers.
Steps:
Follow the previous blogpost (http://dhanuka84.blogspot.com/2020/04/learning-kubernetes-by-example-kafka.html) and then follow additional steps mentioned below.
1. Deploy Prometheus operator
kubectl apply -n default -f config/samples/bundle.yaml
2. Deploy Kafka cluster related Prometheus workloads
kubectl create -n default -f config/samples/kafkacluster-prometheus.yaml
3. Port forward to both Cruise-Control WebUI and Prometheus WebUI.
kubectl port-forward -n kafka svc/kafka-cruisecontrol-svc 8090:8090 &
kubectl port-forward -n default pod/prometheus-kafka-prometheus-0 9090:9090
4. Then login to WebUIs
5. Create kafka-internal pod
kubectl create -n kafka -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
name: kafka-internal
spec:
containers:
- name: kafka
image: wurstmeister/kafka:2.12-2.1.0
# Just spin & wait forever
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 3000; done;" ]
EOF
6. Create kafka topic with 1001 partitions.
bash-4.4# /opt/kafka/bin/kafka-topics.sh --zookeeper example-zookeepercluster-client.zookeeper:2181 --create --topic alertblog --partitions 101 --replication-factor 3
Created topic "alertblog".
Observations & Explanations
In this kafka-prometheus manifest file, we have configured the rules and out of these we are testing PartitionCountHigh rule/alert.
You can see that above manifest used, one of the kafka broker metrics called, "kafka.server:type=ReplicaManager,name=PartitionCount"
Number of partitions on this broker. This should be mostly even across all brokers.
So if the broker partition count exceed 100, then a triggered will be fired and new broker will be created. Finally re-balanced will be executed.
1. Before triggering the alert
K8s dashboard
Cruise-Control
Prometheus Alerts
2. After alert is triggered.
Kafka-Operator Logs
Prometheus-kafka pod logs
Kafka-Cruise-Control pod logs
Prometheus alerts pending status
Prometheus alerts firing status
Cruise-Control after broker added
K8s dashboard after broker added
References:
https://banzaicloud.com/blog/kafka-alert/