A customer recently asked me about power outages, and what she could do to prepare her virtual infrastructure for fluctuations in the power grid. It is an interesting topic, as not everyone can afford generated power for the datacenter. Most (hopefully all) SMB IT shops have a UPS to condition and support systems in case of an outage. However, when the external power is cut there is a limited time window in which to shut down your systems gracefully before the batteries die.
The big question is, how do we shutdown our infrastructure gracefully? With several UPS vendors, there are management utilities available. For example, the APC Infrastruxure software allows you to monitor the UPS and upon an outage, trigger scriptable events. Your servers, physical or VMs, can be shutdown via such scripts. These can be as simple as the windows “shutdown – \\servername” command, or something from vCenter using the remote CLI interface to shutdown the VM.
Now that you have your servers shutdown gracefully, what to do with your ESX hosts and storage? In this customer’s case, she was using EqualLogic storage and Dell servers. In the case of the ESX hosts, they can be shutdown from the vCenter’s CLI interface. The EqualLogic storage can be shut down via command line as well, either with a serial connection or remotely via SSH. At this point, you should only have a single server functioning, the one running the APC software.
This is all well and good, however what do you do when the lights come back on? If the power goes out completely, you can rely on the EqualLogic storage to automatically boot and come online. Switches follow suit, booting automatically when powered on. Most servers allow a BIOS setting change to allow them to boot automatically at power on as well as Virtual Machines.
The only eventuality this doesn’t cover is if the power never actually goes out. In that case, you will want to look at managed PDUs for your server racks. These allow you to remotely turn on and off each outlet on the strip. With this you can cycle the power to your storage and physical servers, causing them to boot automatically. Once your hardware is back online, your virtual infrastructure will restart and you will be back in business.
Error: Twitter did not respond. Please wait a few minutes and refresh this page.