Disaster Recovery Versus Business Continuance
One of the biggest nightmares of any IT executive is when they hear these words: “The systems are down!” This is a situation no one wants to hear since IT infrastructure is designed to provide access to companies data and applications so that users can carry out the purposes of the business. This impacts all aspects of a business including facilitating sales, providing data to accounting to pay bills and invoices, to providing server space for marketing collateral, and providing pertinent data needed for manufacturing. These examples and many more are daily functions needed to enable and maintain business operations.
When IT systems are offline even for just a few minutes, this can have a huge impact on a company. This can cost the business a significant amount of money and can even cost people their lives. Business systems uptime is crucial to the survival of every business.
Data is the life blood of IT systems. The applications that utilize and manipulate data is the heart of every organization. Protecting this data should be the number one priority of an IT department. Ensuring that reliable backups are completed daily, is the most important way to ensure the protection of the “Life Blood” of an organization. What can happen when an unexpected disaster occurs?
In IT, the definition of a disaster could be described as anything that can remove access to the data and applications that run a business. Earthquakes, floods, fires, and lightning are examples of natural disasters that can have a negative impact on the accessibility of company data. Human error can also have an impact just as well. Whether this be accidental or intentional, people can do serious damage to the operations of an organization.
Hardware and software failures and system update bugs, can all contribute to a bad day in IT. There is also that ever so annoying external threat of malware, ransomware and viruses.
These are all examples of unexpected threats and disasters you need to be prepared to either defend against or recover from.
The two main ways to be prepared for these types of challenges are to either configure the systems for “Business Continuance” (high availability), be prepared with a Disaster Recovery Plan (DR Plan) or both.
Let’s first take a look at the anatomy of a DR Plan. Simply stated, a DR Plan should be a playbook on the steps needed to bring your business systems back into production status in the event of an outage or disaster.
Too many people have the false conception that all they need to do is backup their data, and that’s it. But without that plan supported with the proper steps to recover, things may not go as well as you would like when it comes to doing a recovery.
One suggestion when starting on the development of a DR Plan is to have an external third party perform a Business Impact Analysis (BIA). The purpose of this is to have someone from outside of your organization work with your team to identify the importance and prioritization of applications that need to be recovered first.
During this process is when you’ll need to determine the minimum or maximum amount of data that you can afford to lose in the event of an outage. The two main objectives that we look at in a DR plan are Recovery Point Objective (RPO), and Recovery Time Objective (RTO). There are, of course, several other objectives that can be considered, but let’s start with these two objectives for now.
The RPO is the point in time to which you HAVE TO or WANT TO recover to in the event of an outage. For example, if you have a small non-critical application that does not change but once or twice a week, an RPO of 24 or 48 hours may be acceptable. This can be achieved by recovering from your nightly backup. But if you have a mission critical application that costs your company $1 million+ per hour and your system is down, then I would expect that your RPO for that application would be somewhat less than 1 hour, and most likely as close to 0 minutes as possible.
RTO is the amount of time it will take to recover to the desired RPO. As in the example above, that non- critical application with a 48-hour RPO, may be fine to take another 24 to 48 hours to recover to. However, the mission critical application is still costing money for every minute it is down.
Once the desired RPO and RTO has been determined for each of the business apps, that is when you can look at approaches to achieve these objectives. Obviously, performing a daily backup to tape for all applications will not suffice for any of those applications with RPO/RTO of less than an hour. What do you do then?
Leveraging hardware snapshot technologies can help reduce the RPO, and in many cases minimize that RTO challenge as well. That is of course, only if the event that took the data offline in the first place, did not result in the actual destruction of the local data center or any of the hardware that is there.
This is where you’ll need to start transitioning to what can be done to minimize the possibility of a single site hardware failure from taking applications down. Business Continuance is the ability for applications to remain online or to come back online in a very short timeframe after any disruption of local resources.
Building high availability clusters is one approach. That coupled with storage mirroring and stretch clustering can create active data centers with mirrored resources that can operate independently in the event of a failure at one site.
Depending on the RPO/RTO this remote storage mirroring option can be modified, as long as some applications can withstand longer RPO/RTO times. Your business objectives can be weighed against the cost of options to protect the data and adjusted to fit your needs.
Another challenge, however, when doing storage replication (or mirroring) is that if you get a data corruption at the primary site, then that same corruption will be replicated to the target site. That is where a Continuous Data Protection or Point-In-Time copy of data would need to be introduced to the IT environment to help improve that RPO/RTO requirement.
One attractive option in today’s day and age is to utilize DRaaS providers to replicate data sets into the cloud and utilize cloud resources as landing zones in the event of an actual need. The cost of DR infrastructure is minimized when all your doing is replicating to the cloud. In the event of an emergency, you fail over to the cloud and pay for the resources when you need them.
Keeping that life blood pumping is what your applications are designed to do. When that process gets disrupted is not the time to start doing something. NOW is the time to start building your plan, if you have not done so already. Nth Generation’s engineers have been assisting customers recover and prepare for disasters for years. If you would like to discuss how to better be prepared for a Disaster Recovery or Business Continuance Plan, please reach out to us. We’re here to help!
Jim Russ Vice President Enterprise Technology