We’ve been talking a lot about SLA’s (Service Level Agreements) in the hosting industry at work lately. As a hosting company, like any other company, you’re dependant on many things:
- Power and Cooling: Our data centre provides and lives up to their 100% SLA with a dedicated substation and dual diesel generators. No outages since 2001 is a nice way to live up to a promise!
- Internet connectivity: Again, we’re looking at a 100% SLA from our suppliers. That’s an enviable position; something you don’t get with a DSL or ordinary leased line. We monitor that closely and there’s been no outages yet.
- Server Uptime: This one completely depends on manufacturer reliability, clustering, etc.
That last one is a tough nut to crack. There’s so many variables to cover there but C Infinity takes no short cuts. We take the concepts of our data centre and not only extend them into our racks but wrap that service up with our partnership approach and with our expertise in server management:
- All servers are purchased directly from an official Hewlett Packard reseller. In fact, we only work with the highest accredited partner in Ireland.
- All of our network equipment is purchased from an official Cisco reseller. A CCIE designs and manages our infrastructure.
- All hardware gets a 4 hour response time support contract. That’s been tested, e.g. fixes have been done on site within that promised 4 hours and with planned, communicated and minimal downtime.
- All equipment has A+B power. That means redundant paths to the Internet from the server via paired devices and 2 power supplies fed from different circuits.
- Where SAN storage is used, there are paired and independent paths between the servers and the disk.
- We design our infrastructure according to the concepts of Dynamic Infrastructure and manage it according to Optimised Infrastructure.
Conversation today turned to “what are the expected comparative uptimes for a standalone server and a clustered service?”. I was curious. What about for a standalone physical server? The best one to test would the one with the longest running time, our very first managed server hosting customer. I fired up an OpsMgr (System Center Operations Manager) report on their server and there was the availability summary: 100%. One hundred percent uptime; that’s a staggering number for a physical server!
OK, let’s face it. Over a fair sample of tens of thousands of servers a standalone server is probably going to have an uptime of around 98% or 99%. But that’s where alternative designs come in. Sometimes it means building a cluster. A physical cluster for a single service generally can give 99.99% to 99.999% uptime over an extended period. Clusters aren’t cheap to build. So an alternative is to build a virtualisation farm where every host is clustered. That means the virtual machines running on the farm are independent of the hardware and can move from one host to another, pro-actively and/or reactively, manually and/or automatically.
If I ran an availability report on our oldest virtual machine then I should run the same report on our oldest virtual host. What did it report? 100%. One hundred percent, again! I have to admit I’m a little proud of that.
We’re running Hyper-V for our virtualisation platform and manage it using System Center Virtual Machine Manager and System Center Operations Manager. One of the criticisms laid down at Hyper-V was that Quick Migration of a virtual machine from one host to another took a few seconds. Sure, it does. And we have moved virtual machines around to perform maintenance on hosts including security updates, service packs and even a memory board replacement. But the time taken was so small that it didn’t reduce the server uptime from 100%. By the way, Hyper-V in Windows Server 2008 R2 includes zero downtime Live Migration and we’ll be deploying that once Virtual Machine Manager 2008 R2 is released.
So our oldest hosted physical server has 100% uptime and our oldest hosted virtual machine has 100% uptime. I guess that means we’re doing our job right.
Please contact us if you’re interested in learning how you can get these sorts of uptime results in managed server hosting.
Related posts:


