Answer Key: GKE Control Plane & Data Plane
Exercise 1: Design Node Pools
Question: Design node pools for a multi-tier application (web, app, database). What machine types? How many nodes?
Answer
Goal: Design node pools optimized for different workload types.
Node Pool Design
1. Web Tier Node Pool
Purpose: Host web servers (stateless, high traffic)
Machine Type: n1-standard-4 (4 vCPU, 15GB RAM)
- CPU: 4 vCPU for handling HTTP requests
- Memory: 15GB for web server processes
- Cost: Moderate cost, good performance
Node Count: 3-10 nodes (auto-scaling) - Min: 3 nodes (high availability) - Max: 10 nodes (handle peak traffic) - Auto-scaling: Based on CPU/memory utilization
Configuration:
- OS: Container-Optimized OS
- Preemptible: No (need reliability)
- Labels: tier=web
2. App Tier Node Pool
Purpose: Host application servers (CPU-intensive, stateful)
Machine Type: n1-standard-8 (8 vCPU, 30GB RAM)
- CPU: 8 vCPU for application processing
- Memory: 30GB for application state
- Cost: Higher cost, better performance
Node Count: 5-20 nodes (auto-scaling) - Min: 5 nodes (handle base load) - Max: 20 nodes (handle peak load) - Auto-scaling: Based on CPU/memory utilization
Configuration:
- OS: Container-Optimized OS
- Preemptible: No (need reliability)
- Labels: tier=app
3. Database Tier Node Pool
Purpose: Host database pods (memory-intensive, I/O-intensive)
Machine Type: n1-highmem-8 (8 vCPU, 52GB RAM)
- CPU: 8 vCPU for database operations
- Memory: 52GB for database cache
- Cost: Higher cost, optimized for memory
Node Count: 3-6 nodes (limited scaling) - Min: 3 nodes (high availability) - Max: 6 nodes (limited by database design) - Auto-scaling: Conservative (databases don't scale horizontally easily)
Configuration:
- OS: Container-Optimized OS
- Preemptible: No (critical tier)
- Labels: tier=database
- Taints: database=true:NoSchedule (prevent non-database pods)
Node Pool Summary
| Tier | Machine Type | Nodes | Use Case |
|---|---|---|---|
| Web | n1-standard-4 | 3-10 | Stateless web servers |
| App | n1-standard-8 | 5-20 | Application servers |
| Database | n1-highmem-8 | 3-6 | Database pods |
Key Principles
- Right-size machines: Match machine type to workload
- Separate tiers: Different node pools for different workloads
- Auto-scaling: Enable auto-scaling for variable load
- High availability: Minimum 3 nodes per tier
- Cost optimization: Use appropriate machine types
Exercise 2: Pod Scheduling
Question: You have pods that need to run on specific nodes. How do you ensure they're scheduled correctly?
Answer
Goal: Ensure pods are scheduled on correct nodes using Kubernetes scheduling features.
Scheduling Strategies
1. Node Selectors
Use case: Simple node selection based on labels
Example:
apiVersion: v1
kind: Pod
metadata:
name: database-pod
spec:
nodeSelector:
tier: database
containers:
- name: database
image: postgres:13
How it works:
- Pod only schedules on nodes with tier=database label
- Simple and straightforward
- Hard requirement (pod won't schedule if no matching nodes)
2. Node Affinity
Use case: More flexible node selection
Example:
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: tier
operator: In
values:
- app
- web
How it works:
- Prefer nodes with tier=app or tier=web
- Soft requirement (pod can schedule elsewhere if needed)
- More flexible than node selectors
3. Taints and Tolerations
Use case: Prevent pods from scheduling on specific nodes
Example:
# Taint node
kubectl taint nodes database-node-1 database=true:NoSchedule
# Pod with toleration
apiVersion: v1
kind: Pod
metadata:
name: database-pod
spec:
tolerations:
- key: database
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: database
image: postgres:13
How it works: - Taint prevents most pods from scheduling - Toleration allows specific pods to schedule - Useful for dedicated nodes (e.g., database nodes)
4. Pod Affinity/Anti-Affinity
Use case: Co-locate or separate pods
Example:
apiVersion: v1
kind: Pod
metadata:
name: app-pod
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- myapp
topologyKey: kubernetes.io/hostname
How it works:
- Co-locate pods with same app=myapp label on same node
- Useful for related pods that benefit from co-location
Best Practices
1. Use Node Selectors for Simple Cases: - Simple label matching - Hard requirements
2. Use Node Affinity for Flexibility: - Soft requirements - Multiple preferences
3. Use Taints/Tolerations for Dedicated Nodes: - Database nodes - GPU nodes - Specialized hardware
4. Use Pod Affinity for Co-location: - Related pods - Performance optimization
5. Monitor Scheduling: - Check pod scheduling status - Monitor node utilization - Adjust as needed
Answer
Use Node Selectors for simple cases:
- Add labels to nodes: tier=web, tier=app, tier=database
- Use nodeSelector in pod spec to select nodes
Use Taints/Tolerations for dedicated nodes:
- Taint database nodes: database=true:NoSchedule
- Add toleration to database pods
- Prevents non-database pods from scheduling
Use Pod Affinity for co-location: - Co-locate related pods on same node - Improve performance and reduce network latency
Key principles: - Labels: Use labels to identify node types - Selectors: Use node selectors for simple matching - Taints: Use taints to reserve nodes for specific workloads - Affinity: Use affinity for flexible scheduling - Monitor: Monitor scheduling and adjust as needed
Exercise 3: Debug Pod Failure
Question: A pod is crashing. How do you debug this? What logs do you check?
Answer
Goal: Debug pod crashes systematically using Kubernetes debugging tools.
Debugging Steps
1. Check Pod Status
Command:
kubectl get pods
kubectl describe pod <pod-name>
What to look for:
- Status: CrashLoopBackOff, Error, Pending
- Events: Recent events showing errors
- Restarts: Number of restarts
- State: Current state and last state
Example output:
Name: app-pod
Status: CrashLoopBackOff
Restarts: 5
Last State: Terminated
Reason: Error
Exit Code: 1
2. Check Pod Logs
Command:
kubectl logs <pod-name>
kubectl logs <pod-name> --previous # Previous container instance
kubectl logs <pod-name> -f # Follow logs
What to look for: - Error messages: Application errors - Stack traces: Exception stack traces - Startup errors: Configuration errors - Runtime errors: Application runtime errors
3. Check Container Logs
Command:
kubectl logs <pod-name> -c <container-name> # Multi-container pods
What to look for: - Sidecar logs: Sidecar container logs - Init container logs: Init container logs - Application logs: Main application logs
4. Check Events
Command:
kubectl get events --sort-by=.metadata.creationTimestamp
kubectl describe pod <pod-name> | grep Events -A 10
What to look for: - Scheduling events: Pod scheduling issues - Image pull events: Image pull failures - Startup events: Container startup failures - Health check events: Liveness/readiness probe failures
5. Check Resource Limits
Command:
kubectl describe pod <pod-name> | grep Limits -A 5
kubectl top pod <pod-name>
What to look for: - Memory limits: OOM (Out of Memory) kills - CPU limits: CPU throttling - Resource usage: Current vs limits
6. Check Configuration
Command:
kubectl get pod <pod-name> -o yaml
kubectl get configmap <configmap-name> -o yaml
kubectl get secret <secret-name> -o yaml
What to look for: - Environment variables: Missing or incorrect values - ConfigMaps: Configuration errors - Secrets: Missing or incorrect secrets - Volume mounts: Volume mount issues
7. Check Health Checks
Command:
kubectl describe pod <pod-name> | grep Liveness -A 5
kubectl describe pod <pod-name> | grep Readiness -A 5
What to look for: - Liveness probe: Failing liveness probes - Readiness probe: Failing readiness probes - Probe configuration: Incorrect probe settings
Common Issues and Solutions
1. Image Pull Errors
Symptoms: ImagePullBackOff, ErrImagePull
Debug:
kubectl describe pod <pod-name> | grep Events
Solutions: - Check image name and tag - Check image pull secrets - Verify image registry access
2. OOM Kills
Symptoms: OOMKilled, frequent restarts
Debug:
kubectl describe pod <pod-name> | grep OOM
kubectl top pod <pod-name>
Solutions: - Increase memory limits - Optimize application memory usage - Check for memory leaks
3. Configuration Errors
Symptoms: Application startup failures, missing config
Debug:
kubectl logs <pod-name>
kubectl get configmap <configmap-name>
Solutions: - Check ConfigMap/Secret existence - Verify environment variables - Check configuration values
4. Health Check Failures
Symptoms: Unhealthy, Readiness probe failed
Debug:
kubectl describe pod <pod-name> | grep Probe
kubectl exec <pod-name> -- curl http://localhost:8080/health
Solutions: - Fix health check endpoint - Adjust probe timeout/interval - Check application health
Answer
Debugging Steps:
- Check pod status:
kubectl get podsandkubectl describe pod <pod-name> -
Look for status, restarts, events
-
Check pod logs:
kubectl logs <pod-name> -
Look for error messages, stack traces
-
Check previous logs:
kubectl logs <pod-name> --previous -
Check logs from previous container instance
-
Check events:
kubectl get events -
Look for scheduling, image pull, startup events
-
Check resources:
kubectl top pod <pod-name> -
Look for OOM kills, CPU throttling
-
Check configuration:
kubectl get pod <pod-name> -o yaml -
Look for ConfigMaps, Secrets, environment variables
-
Check health checks:
kubectl describe pod <pod-name> - Look for liveness/readiness probe failures
Key logs to check: - Application logs: Main application error logs - Container logs: Sidecar and init container logs - System logs: Kubernetes system logs - Event logs: Pod events and scheduling events
Common issues: - Image pull errors: Check image name, registry access - OOM kills: Increase memory limits, optimize memory usage - Configuration errors: Check ConfigMaps, Secrets, environment variables - Health check failures: Fix health check endpoint, adjust probe settings