Terraform for GKE: Fundamentals and Advanced Project Structure

This blog post is based on a sample project which you can find on github.

This guide covers Terraform basics and best practices for both beginners and experienced cloud engineers, using a real-world GKE (Google Kubernetes Engine) cluster project as an example.

1. Introduction

Terraform is an open-source Infrastructure as Code (IaC) tool that lets you safely and predictably create, change, and improve infrastructure. This guide uses a real GCP/GKE project to demonstrate both fundamental and advanced Terraform patterns.

2. Terraform Fundamentals

2.1 Providers and Resources

provider "google" {
project = var.project_id
region = var.region
}

resource "google_container_cluster" "gke" {
name = var.cluster_name
location = var.region
# ... other configuration ...
}

Provider connects Terraform to GCP (or AWS, Azure, etc).
Resource defines a specific piece of infrastructure (e.g., GKE cluster).

2.2 Variables and Outputs

variable "project_id" {
description = "GCP Project ID"
type = string
}
output "cluster_endpoint" {
value = google_container_cluster.gke.endpoint
}

Variables make configs flexible and reusable.
Outputs expose resource info for use elsewhere or for easy reference.

3. Advanced Project Structure: Modules and Environments

3.1 What Are Terraform Modules?

A Terraform module is a folder that organizes related resources for a specific purpose—such as networking, Kubernetes, or security. Modules make your code DRY, reusable, and easier to maintain. Each environment (like dev, prod) can call the same modules with different inputs.

3.2 Cloud Modules Used in This Project

Main cloud components/resources used in this project.

VPC (Virtual Private Cloud)

Contains everything.

Subnet

Where your resources live.

Bastion Host

A VM in the subnet for admin SSH (via IAP).

Cloud NAT

Enables internet access for private resources.

Firewall Rules

Secure access (e.g., allow IAP for SSH).

GKE Private Cluster

Nodes inside subnet (no public IPs).
Master only accessible from Bastion.

VPC (Virtual Private Cloud)

Contains all network resources.

Subnet

Where specific resources are deployed within a VPC.

Bastion Host

A virtual machine within the subnet used for administrative SSH access, typically via IAP (Identity-Aware Proxy).

Cloud NAT

Provides internet access for private resources within a subnet.

Firewall Rules

Control network access, such as allowing IAP for SSH connections.

GKE Private Cluster

Cluster nodes reside within a private subnet and do not have public IP addresses.
The master node is only accessible from the Bastion host.

1. Networking (VPC, Subnet, Firewall, NAT) Module

Purpose:

Creates your own virtual private cloud (VPC) network for all your resources.
Defines subnets for better network segmentation.
Adds firewall rules for secure, controlled access.
Sets up Cloud NAT for outbound internet access for private resources.

Why It Matters:

VPC is your isolated cloud network.
Subnets keep environments and workloads organized.
Firewall secures all communication.
NAT keeps your nodes private, but able to access the internet for updates.

Example Resource:

resource "google_compute_network" "vpc" {
name = var.network_name
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = var.subnet_name
region = var.region
network = google_compute_network.vpc.self_link
ip_cidr_range = var.subnet_cidr
}
resource "google_compute_firewall" "allow_iap_ssh" {
name = "${var.network_name}-allow-iap-ssh"
network = google_compute_network.vpc.self_link
allow {
protocol = "tcp"
ports = ["22"]
}
source_ranges = ["35.235.240.0/20"] # IAP
}
resource "google_compute_router_nat" "nat" {
name = "${var.network_name}-nat-gateway"
router = google_compute_router.router.name
region = var.region
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}

2. GKE Cluster Module

Purpose:

Provisions a secure, managed Kubernetes (GKE) cluster attached to your VPC and subnet.
Sets up private nodes, authorized access, and node pools.

Why It Matters:

Automates complex GKE cluster creation.
Ensures your cluster is only reachable by secure routes (e.g., from bastion).

Example Resource:

resource "google_container_cluster" "primary" {
name = var.cluster_name
location = var.location
network = var.network_name
subnetwork = var.subnetwork_name
private_cluster_config {
enable_private_endpoint = true
enable_private_nodes = true
master_ipv4_cidr_block = var.master_ipv4_cidr_block
}
master_authorized_networks_config {
cidr_blocks {
cidr_block = "${var.bastion_host_ip}/32"
display_name = "bastion-host-access"
}
}
# ...other config...
}

3. Bastion Host Module

Purpose:

Creates a jump (bastion) host in a private subnet.
Enables secure admin access via Proxy server (Tiny Proxy).
We can follow this blog post to install Tiny Proxy in Bastion VM/server.

Why It Matters:

Lets administer SSH into the cluster securely, without exposing public IPs.
A key requirement for private GKE clusters.

Example Resource:

resource "google_compute_instance" "bastion" {
name = var.instance_name
zone = var.zone
machine_type = "e2-medium"
network_interface {
network = var.network_name
subnetwork = var.subnetwork_name
network_ip = google_compute_address.bastion_internal_ip.address
}
}
resource "google_project_iam_member" "iap_accessor" {
project = var.project_id
role = "roles/iap.tunnelResourceAccessor"
member = var.iam_member
}

How to access GKE cluster

1. Configure Jump Host to access via proxy

#Bash

gcloud compute ssh jump-host \

--tunnel-through-iap \

--project=trs-project-386114 \

--zone=europe-north1-a \

--ssh-flag="-4 -L8888:localhost:8888 -N -q -f"

2. Use proxy to access control pane of GKE private cluster

export HTTPS_PROXY=localhost:8888

kubectl get nodes

3.1 Why Use Modules?

Encapsulate logic for re-use across environments (dev, prod, etc).
Organize code for easier collaboration and maintenance.

3.2 Example Directory Layout

terraform-project/
├── modules/
│ ├── networking/
│ ├── gke_cluster/
│ └── bastion_host/
├── environments/
│ └── dev/
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── versions.tf
├── .gitignore
└── README.md

modules/: Reusable logic for networking, GKE, bastion, etc.

environments/: Each environment (dev, prod, etc) gets its own configs, calling the same modules. Because of this structure, modules become reusable across multiple environments.

4. Example: Calling Modules in terraform-project/environments/dev/main.tf

module "networking" {
source = "../../modules/networking"
project_id = var.project_id
region = var.region
network_name = "gke-dev-vpc"
subnet_name = "gke-dev-subnet"
subnet_cidr = "10.10.0.0/20"
}

module "gke_cluster" {
source = "../../modules/gke_cluster"
project_id = var.project_id
location = var.zone
cluster_name = "private-dev-cluster"
network_name = module.networking.network_name
subnetwork_name = module.networking.subnet_name
}

4.1 Typical Terraform Workflow

Here’s how you run your infrastructure with Terraform for each environment (e.g., dev, staging, prod):

Navigate to the desired environment directory:
cd environments/dev
Change to the folder containing your environment’s root Terraform configs.
Initialize Terraform:
terraform init
Downloads required providers and sets up your local working directory for this project.
Validate Terraform:

terraform validate

Purpose:
Checks whether your Terraform configuration files are syntactically and structurally correct.

Key Features:

Does not access any remote state or cloud provider.
Checks for things like:

Missing variables
Incorrect resource blocks
Invalid syntax
Unsupported arguments

Use case:
Useful early in development, especially in CI/CD pipelines, to catch typos or malformed configs.

Review the plan:
terraform plan
Shows what Terraform will do before making any real changes—review carefully!

What does terraform plan do?

The terraform plan command performs a dry run to preview the changes Terraform would make to your infrastructure—without actually applying them.

It compares your code (what you want) with the current state (what exists) and shows:

- ✅ What will be created (+)

- 🔄 What will be changed (~)

- ❌ What will be destroyed (-)

This step is crucial for verifying that your changes are intentional and safe.

How to verify the plan output

- Look for + for new resources you expect (e.g., new cluster, VPC)

- Check ~ to confirm the exact changes you want

- Be cautious with - to avoid destroying live resources

- Make sure resource names, types, and configurations match expectations

Example output snippet:

Terraform will perform the following actions:

# google_compute_network.vpc will be created
+ name = "gke-dev-vpc"
+ auto_create_subnetworks = false

# google_container_cluster.primary will be created
+ name = "private-dev-cluster"
+ network = "gke-dev-vpc"

Plan: 2 to add, 0 to change, 0 to destroy.

Example Resource:

resource "google_compute_instance" "bastion" {
name = var.instance_name
zone = var.zone
machine_type = "e2-medium"
network_interface {
network = var.network_name
subnetwork = var.subnetwork_name
network_ip = google_compute_address.bastion_internal_ip.address
}
}
resource "google_project_iam_member" "iap_accessor" {
project = var.project_id
role = "roles/iap.tunnelResourceAccessor"
member = var.iam_member
}

Apply the changes:
terraform apply
Creates, updates, or destroys infrastructure to match your configuration.

5. Special Files: terraform.tfvars and versions.tf

terraform.tfvars

Supplies actual values for your variables (e.g. project_id, region, zone).
Allows easy environment switching and secrets separation.

project_id = "mycom-project"
region = "europe-north1"

Best Practice:

Do NOT commit with real secrets; use .gitignore.

versions.tf

Pins Terraform and provider versions for consistent results across teams/CI/CD.

terraform {
required_version = ">= 1.3"
required_providers {
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
kubernetes = {
source = "hashicorp/kubernetes"
version = "~> 2.23"
}
}
}

Best Practice:

Pin major versions; update only after testing.

6. Advanced Concepts for Cloud Engineers

Remote state: Store .tfstate in GCS for team use.
Workspaces: Use for easy multi-environment setups.
State locking and locking backends: Prevents race conditions in CI/CD.
Sensitive outputs: Mark outputs as sensitive for secrets.

7. Tips & Gotchas

Never commit .tfstate or real secrets.
Test modules in isolation.
Use module outputs to chain resources.
Read Terraform docs for new features and best practices.

8. Conclusion

A modular, environment-driven project structure lets teams scale IaC from dev to prod with safety and speed. Start with the basics and grow your repo’s complexity as your needs evolve!

Grasshopper ----------> Master ( D 2 )

7/24/2025