So yesterday my Windows Server VMs running in Windows Azure (VM Role) were automatically shutdown and then later restarted. I assume this occurred due to an update/update to the host server and/or environment. I have my servers deployed in pairs where each pair is in the same availability set. The idea here is that only one VM per availability set will be taken offline at any one time. As servers are added into an availability set they are done so without adding the server into the same rack/fault domain as the other members. The theory is this should push your SLA from 99.9% to 99.95% (I assume the last .05% is to account for certificate expiration)
When determining how machine machines to add into an availability set you need to ensure the load handled by the machines can be satisfied with x-1 machines were x represents the number of machines in the availability set. So in my case, for this pair, my x was 2 with the idea a single server could handle the load. Of course you would likely want to configure more to ensure there is no single point of failure. It is fairly trivial to add 4, 5, or even more servers into an availability set using PowerShell.
With the promise from Azure that only one server will ever be down at any one time the next question you may have is: So how did my Azure invoked outage yesterday fair?
|Server||Offline Date/Time||Online Date/Time||Time offline|
|SRV-01||11:04:28am CST||11:22:50am CST||~18 minutes|
|SRV-02||12:02:38pm CST||12:22:12pm CST||~20 minutes|
So as you can see Windows Azure’s Availability Sets worked as advertised!