Blueprints for High Availability: Designing Resilient Distributed Systems by Evan Marcus
Reliability is not a quality that can simply be purchased, instead it needs to be engineered into a system or product. This text offers a guide to the assessment, design, implementation, and testing of a system for 100% reliability. The authors provide the reader with a series of practical blueprints, disciplines, and processes for assessing risks to a distributed system, assigning costs and selecting appropriate reliability levels, and designing and testing solutions without excessive downtime.