Answer Key: Data Pipeline

Exercise 1: Design Improvements

Question: How would you improve this design? What tradeoffs?

Potential improvements:

Tradeoffs: - More complexity vs better reliability - Higher cost vs better performance - More components vs simpler architecture

Answer: Add validation layer, implement schema registry, enhance monitoring, optimize costs. Balance complexity vs reliability.

Question: How do you handle schema changes without breaking the pipeline?

Schema evolution strategies:

Answer: Use schema registry, maintain backward compatibility, migrate gradually, handle multiple versions.

Question: How would you reduce costs by 30%? What tradeoffs?

Cost optimization strategies:

Tradeoffs: - Lower cost vs higher latency - Less redundancy vs cost savings - More optimization effort vs cost reduction

Answer: Right-size workers, optimize BigQuery, use preemptible workers, optimize Pub/Sub. Balance cost vs performance.