The purpose of this blog post is to setup minikube cluster + kafka cluster and then deploying Apache Airflow DAG pipeline in local machine.
π ️ Prerequisites
Before we dive in, make sure you have the following installed:
- Minikube
- kubectl
- Helm
- Docker
π¦ Step 1: Set Up a Minikube Cluster
Spin up a Minikube cluster:
```
minikube start --memory=12000 --cpus=4
```
Check cluster:
```
kubectl get nodes
```
Step 2: Install the Confluent for Kubernetes (CFK) Operator
Use this script to deploy Confluent:
https://github.com/dhanuka84/my-first-apache-airflow-setup/blob/main/cfk_kraft_quickstart.sh
π Step 3: Access the Kafka Control Center
Forward Control Center:
```
kubectl port-forward controlcenter-0 9021:9021
```
Visit http://localhost:9021 and create topic:
```
my-csv-topic
```
☁️ Step 4: Install Apache Airflow via Helm
Add the official Airflow Helm chart repository and install it:
```
helm repo add apache-airflow https://airflow.apache.org
helm repo update
export NAMESPACE=confluent
export RELEASE_NAME=example-release
helm install $RELEASE_NAME apache-airflow/airflow --namespace $NAMESPACE --create-namespace
```
Expose the Airflow web UI:
```
kubectl port-forward svc/example-release-api-server 9080:8080 -n confluent
```
Log in to Airflow using:
- Username: admin
- Password: admin
π§° Step 5: Customize the Airflow Image
Build and deploy custom image:
π https://github.com/dhanuka84/my-first-apache-airflow-setup/blob/main/deploy_airflow.sh
Run:
```
./deploy_airflow.sh 0.0.1
```
Wait for pods:
```
kubectl get pods -n confluent
```
π§ͺ Step 6: Trigger the DAG
Use the Airflow UI to manually trigger your DAG. It sends 4 CSV rows as messages to the Kafka topic.
π¬ Step 7: Validate Kafka Messages
In Control Center UI, view `my-csv-topic` to verify the 4 records were published.
✅ Conclusion
You've set up a complete local pipeline using Kafka, Airflow, and Minikube. This stack is perfect for prototyping and can scale to production environments.
No comments:
Post a Comment