4/12/2020

Learning Kubernetes by Example : Kafka Cluster Auto Scaling and Re-balancing





This post is the next episode of previous post (http://dhanuka84.blogspot.com/2020/04/learning-kubernetes-by-example-kafka.html) .
In this blog post we are mainly focus on how to trigger alerts with Prometheus and how kafka-operator react based on those alerts. Finally how Cruise-Control rebalance the kafka cluster based on the same alert.
Please note that, since this post is some kind of simple version of two references mentioned in the bottom, I have hijacked the diagram from same article :).


Key Points:


  1. Once we provisioned manifests (kafkacluster-prometheus.yaml) for key Prometheus components, Prometheus operator will create kafka-alerts, kafka-prometheus and kafka-operator-alertmanager .
  2. When you provision kafka cluster (Envoykafkacluster.yaml),  monitoringConfig will deploy Prometheus jmx java agents in brokers. So Prometheus can monitor kafka cluster.
  3. Based on the metrics and relevant rules that has been configured, Prometheus will generate alerts to Kafka-Operator. Then Kafka-Operator will create kafka-broker pods (according to below test case).
  4.   Kafka-Operator will propagate same alert to Cruise-Control, then Cruise-Control will do cluster re-balancing on Kafka brokers. 

Steps:


Follow the previous blogpost (http://dhanuka84.blogspot.com/2020/04/learning-kubernetes-by-example-kafka.html) and then follow additional steps mentioned below.


1. Deploy Prometheus operator

kubectl apply -n default -f config/samples/bundle.yaml

2. Deploy Kafka cluster related Prometheus workloads

kubectl create -n default -f config/samples/kafkacluster-prometheus.yaml

3. Port forward to both Cruise-Control WebUI and Prometheus WebUI.

kubectl port-forward -n kafka svc/kafka-cruisecontrol-svc 8090:8090 &
kubectl port-forward -n default pod/prometheus-kafka-prometheus-0 9090:9090

4. Then login to WebUIs



5. Create kafka-internal pod

kubectl create -n kafka -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: kafka-internal
spec:
  containers:
  - name: kafka
    image: wurstmeister/kafka:2.12-2.1.0
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]
EOF


6. Create kafka topic with 1001 partitions.

bash-4.4# /opt/kafka/bin/kafka-topics.sh --zookeeper example-zookeepercluster-client.zookeeper:2181 --create --topic alertblog --partitions 101 --replication-factor 3

Created topic "alertblog".


Observations & Explanations


In this kafka-prometheus manifest file, we have configured the rules and out of these we are testing PartitionCountHigh rule/alert.





You can see that above manifest used, one of the kafka broker metrics called, "kafka.server:type=ReplicaManager,name=PartitionCount"
Number of partitions on this broker. This should be mostly even across all brokers.

So if the broker partition count exceed 100, then a triggered will be fired and new broker will be created. Finally re-balanced will be executed.


1. Before triggering the alert


K8s dashboard


 

Cruise-Control




Prometheus Alerts



2. After alert is triggered.


Kafka-Operator Logs




Prometheus-kafka pod logs




Kafka-Cruise-Control pod logs



Prometheus alerts pending status


Prometheus alerts firing status


Cruise-Control after broker added





K8s dashboard after broker added




References:


https://medium.com/@prune998/banzaicloud-kafka-operator-and-broker-autoscaling-1c7324260de1

https://banzaicloud.com/blog/kafka-alert/

No comments:

Post a Comment