Answer Key: Testing for Failure
Exercise 1: Design Failure Tests
Question: Design failure tests for a distributed system. What failures do you test?
Answer
Failure Tests: - Network failures: Latency, packet loss, partitions - Node failures: Process kills, reboots, resource exhaustion - Service failures: Service down, slow responses, errors - Database failures: Connection failures, query failures
Answer: Test network, node, service, and database failures systematically.
Exercise 2: Run Chaos Experiment
Question: Run a chaos engineering experiment. What's the process?
Answer
Chaos Process: 1. Hypothesis: Form hypothesis about system behavior 2. Experiment: Run in staging first, then production 3. Observe: Monitor system behavior 4. Learn: Learn from results, improve system
Answer: Hypothesis → Experiment → Observe → Learn, start in staging.
Exercise 3: Handle Test Failure
Question: Your failure test causes system failure. How do you respond?
Answer
Response: 1. Stop test: Stop failure injection immediately 2. Restore: Restore system to normal state 3. Investigate: Investigate why system failed 4. Fix: Fix issues, improve resilience 5. Retry: Retry test after fixes
Answer: Stop test, restore system, investigate, fix issues, improve resilience, retry.