I liked David’s point about downtime: it doesn’t matter to customers whether the downtime was “scheduled” or “unscheduled” — just that it’s down. Thus, cloud software providers should be measured by total downtime and not a metric engineered to get providers off the hook for SLA compliance.
I’ve always believed in transparency about uptime. Recently we experienced some significant downtime, which resulted in this post by our CTO about his desire for a chaos monkey.
Unfortunately, I’m disappointed in our results. Our API was only up 99.84% of the time (about 6 hours of downtime) since August 16th, 2011. During that time, we also deployed over 400 automated builds to production (or about 4 per day). We will strive to do better in 2012.