We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node [0], and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.
If you’re not familiar with game days, the best introductory article is this one from John Allspaw [1]. Below, we’ll lay out a playbook for how to run a game day, and describe the results from our latest exercise to show why we believe they are valuable.
Scoop.it!
No comment yet.
Sign up to comment