I picked up Real-World SRE: The Survival Guide for Responding to a System Outage and Maximizing Uptime, as a crash course in the field (and also because I know the author from college). I assumed it would be about expensive technology, dire consequences, and the drive for 99.9999% uptime.
Turns out, reliability in the real world is:
- Identifying problems
- Learning to fix them
- Growing from the process
Any developer can benefit from this book. The technical sections are paired with functional advice about how to manage time, energy, and expectations. Because of this, I found it surprisingly relevant to my work as a frontend developer.
The topics I enjoyed the most were:
- A deep dive into the inner workings of HTTP
- UX design for admin dashboards
- How to share knowledge between departments
This book convinced me that everyone (even the frontend team) should know where the backups are located. Who would’ve guessed that communication is the trick to reliability?