1-888-317-7920 info@2ndwatch.com

Disaster Recovery – Don't wait until it's too late!

October 28th marked the one year anniversary of Hurricane Sandy, the epic storm that ravaged the Northeastern part of the United States. Living in NJ where the hurricane made landfall and having family that lives across much of the state we personally lived through the hurricane and its aftermath. It’s hard to believe that it’s been a year already. It’s an experience we’ll never forget, and we have made plans to ensure that we’re prepared in case anything like that happens again. Business mirrors Life in many cases, and when I speak with customers across the country the topic of disaster recovery comes up often. The conversations typically have the following predictable patterns:

  • I’ve just inherited the technology and systems of company X (we’ll call it company X to protect the innocent), and we have absolutely no backup or disaster recovery strategy at all. Can you help me?
  • We had a disaster recovery strategy, but we haven’t really looked at it in a very long time, I’ve heard Cloud Computing can help me. Is that true?
  • We have a disaster recovery approach we’re thinking about. Can you review it and validate that we’re leveraging best practices?
  • I’m spending a fortune on disaster recovery gear that just sits idle 95% of the time. There has to be a better way.

The list is endless with permutations, and yes there is a better way. Disaster recovery as a workload is a very common one for a Cloud Computing solution, and there’s a number of ways you can approach it. As with anything there are tradeoffs of cost vs. functionality and typically depends on the business requirements. For example a full active/active environment where you need complete redundancy and sub second failover can be costly but potentially necessary depending on your business requirements. In the Financial Services industry for example, having revenue generating systems down for even a few seconds can cost a company millions of dollars.

We have helped companies of all sizes think about, design and implement disaster recovery strategies. From Pilot Lights, where there’s just the glimmer of an environment, to warm standby’s to fully redundant systems. The first step is to plan for the future and not wait until it’s too late.

-Mike Triolo, General Manager East

Facebooktwittergoogle_pluslinkedinmailrss

AWS DR in Reverse!

Amazon and 2nd Watch have published numerous white papers and blog articles on various ways to use Amazon Web Services™ (AWS) for a disaster recovery strategy.  And there is no doubt at all that AWS is an excellent place to run a disaster recovery environment for on premise data centers and save companies enormous amounts of capital while preserving their business with the security of a full DR plan.  For more on this, I’ll refer you to our DR is Dead article as an example of how this works.

What happens though, when you truly cannot have any downtime for your systems or a subset of your systems?  Considering recent events like Hurricanes Sandy and Katrina, how do critical systems use AWS for DR when Internet connectivity cannot be guaranteed?  How can cities prone to earthquakes justify putting DR systems in the Cloud when true disasters they have to consider involve large craters and severed fiber cables?  Suddenly having multiple Internet providers doesn’t seem quite enough redundancy when all systems are Cloud based.  Now to be fair, in such major catastrophes most users have greater concerns than ‘can I get my email?’ or ‘where’s that TPS report?’ but what if your systems are supporting first responders?  DR has an even greater level of importance.

Typically, this situation is what keeps systems that have links to first responders, medical providers, and government from adopting a Cloud strategy or Cloud DR strategy.  This is where a Reverse DR strategy has merit: moving production systems into AWS but keeping a pilot light environment on premise.  I won’t reiterate the benefits of moving to AWS, there are more articles on this than I can possibly reference (but please, contact the 2nd Watch sales team and they’ll be happy to expound upon the benefits!) or rehash Ryan’s article on DR is Dead.  What I will say is this: if you can move to AWS without risking those critical disaster response systems, why aren’t you?

By following the pilot light model in reverse, customers can leave enough on premise to keep the lights on in the event of disaster.  With regularly scheduled s to make sure those on premise systems are running and sufficient for emergencies, customers can take advantage of the Cloud for a significant portion of their environments.  From my experiences, once an assessment is completed to validate which systems are required on premise to support enough staff in the event of a disaster, most customers find themselves able to move 90%+ of their environment to the Cloud, save a considerable amount of money, and suffer no loss of functionality.

So put everything you’ve been told about DR in the Cloud in reverse, move your production environments to AWS and leave just enough to handle those pesky hurricanes on premise, and you’ve got yourself a reverse DR strategy using AWS.

-Keith Homewood, Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

DR Your DB

Databases tend to host the most critical data your business has. From orders, customers, products and even employee information – it’s everything that your business depends on. How much of that can you afford to lose?

With AWS you have options for database recovery depending on your budget and Recovery Time Objective (RTO).

Low budget/Long RTO:

  • Whether you are in the cloud or on premise, using the AWS Command Line Interface (CLI) tools you can script uploads of your database backups directly to S3. This can be added as a step to an existing backup job or an entirely new job.
  • Another option would be to use a third party tool to mount an S3 bucket as a drive. It’s possible to backup directly to the S3 bucket, but if you have write issues you may need to write the backup locally and then move it to the mounted drive.

These methods have a longer RTO as they will require you to stand up a new DB server and then restore the backups, but is a low cost solution to ensure you can recover your business. The catch here is that you can only restore to the last backup that you have taken and copied to S3. You may want to review you backup plans to ensure you are comfortable with what you may lose. Just make sure you use the native S3 lifecycle policies to purge old backups otherwise your storage bill will slowly get out of hand.

High budget/short RTO:

  • Almost all mainstream Relational Database Management Systems (RDBMS) have a native method of replication. You can setup an EC2 Instance database server to replicate your database to. This can be in real-time so that you can be positive that you will not lose a single transaction.
  • What about RDS? While you cannot use native RDBMS replication there are third party replication tools that will do Change Data Capture (CDC) replication directly to RDS. These can be easier to setup than the native replication methods, but you will want to make sure you are monitoring these tools to ensure that you do not get into a situation where you can lose transactional data.

Since this is DR you can lower the cost of these solutions by downsizing the RDS or EC2 instance. This will increase the RTO as you will need to manually resize the instances in the event of failure, but can be a significant cost saver. Both of these solutions will require connectivity to the instance over VPN or Direct Connect.

Another benefit of this solution is that it can easily be utilized for QA, Testing and development needs. You can easily snapshot the RDS or EC2 instance and stand up a new one to work against. When you are done – just terminate it.

With all database DR solutions, make sure you script out the permissions & server configurations. This either needs to be saved off with the backups or applied to RDS/EC2 instances. These are constantly changing and can create recovery issues if you do not account for them.

With an AWS database recovery plan you can avoid losing critical business data.

-Mike Izumi, Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

Storage Gateway with Amazon Web Services

Backup and disaster recovery often require solutions that add complexity and additional cost to properly synchronize your data and systems.  Amazon Web Services™ (AWS) helps drive this cost and complexity with a number of services.  Amazon S3 provides a highly durable (99.999999999%) storage platform for your backups.  This service backs up your data to multiple availability zones (AZ) to provide you the ultimate peace of mind for your data.  AWS also provides an ultra-low cost service for long-term cold storage that is aptly named Glacier.  At $0.01 per GB / month this service will force you to ask, “Why am I not using AWS today?”

AWS has developed the AWS Storage Gateway to make your backups secure and efficient.  For only $125 per backup location per month, you will have a robust solution that provides the following features:

  • Secure transfers of all data to AWS S3 storage
  • Compatible with your current architecture – there is no need to call up your local storage vendor for a special adapter or firmware version to use Storage Gateway
  • Designed for AWS – this provides a seamless integration of your current environment to AWS services

AWS Storage Gateway and Amazon EC2 (snapshots of machine images) together provide a simple cloud-hosted DR solution.   Amazon EC2 allows you to quickly launch images of your production environment in AWS when you need them.  The AWS Storage Gateway seamlessly orchestrates with S3 to provide you a robust backup and disaster recovery solution that meets anyone’s budget.

-Matt Whitney, Sales Executive

Facebooktwittergoogle_pluslinkedinmailrss

DR is Dead

Having been in the IT Industry since the 90s I’ve seen many iterations on Disaster Recovery principals and methodologies.  The concept of DR of course far exceeds my tenure in the field as the idea started coming about in the 1970s as businesses began to realize their dependence on information systems and the criticality of those services.

Over the past decade or so we’ve really seen the concept of running a DR site at a colo facility (either leased or owned) become a popular way for organizations to have a rapidly available disaster recovery option.  The problem with a colo facility is that it is EXPENSIVE!  In addition to potentially huge CapEx (if you are buying your own infrastructure), you have the facility and infrastructure OpEx and all the overhead expense of managing those systems and everything that comes along with that.  In steps the cloud… AWS and the other players in the public cloud arena provide you the ability to run a DR site without having really any CapEx.  Now you are only paying for the virtual infrastructure that you are actually using as an operational cost.

An intelligently designed DR solution could leverage something like Amazon’s Pilot Light to keep your costs reduced by running the absolute minimal core infrastructure needed to keep the DR site fully ready to scale up to production.  Well that is a big improvement over purchasing millions of dollars of hardware and having thousands and thousands of dollars in OpEx and overhead costs every month.  Even still… there is a better way.  If you architect your infrastructure and applications following the AWS best practices, then in a perfect world there is really no reason to have DR at all.  By architecting your systems to balance across multiple AWS regions and availability zones; correctly designing architecture and applications for handling unpredictable and cascading failure; and to automatically and elastically scale to meet increases and decreases in demand you can effectively eliminate the need for DR.  Your data and infrastructure are distributed in a way that is highly available and impervious to failure or spikes/drops in demand.  So in addition to inherent DR, you are getting HA and true capacity-on-demand.  The whole concept of a disaster taking down a data center and the subsequent effects on your systems, applications, and users becomes irrelevant.  It may take a bit of work to design (or redesign) an application to this new cloud geo-distributed model, but I assure you that from a business continuity perspective, reduced TCO, scalability, and uptime it will pay off in spades.

That ought to put the proverbially nail in the coffin. RIP.

-Ryan Kennedy, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss

An Introduction to CloudFormation

An Introduction to CloudFormation

One of the most powerful services in the AWS collection set is CloudFormation. It provides the ability to programmatically construct and manage a grouping of AWS resources in a predictable way.  With CloudFormation, provisioning of an AWS environment does not have to be done through single CLI commands or by clicking through the console, but can be completed through a JSON (Javascript Object Notation) formatted text file, or CloudFormation template.  With CloudFormation, you can build a few or several AWS resources into an environment automatically.  CloudFormation works with several AWS resource types; from AWS network infrastructure (VPCs, Subnets, Routing Tables, Gateways, and Network ACLs), to compute (EC2 and Auto Scaling), to database (RDS and ElastiCache), to storage (S3) components.  You can see the full list here.

The general JSON structure looks like the following:

CloudFormation1

A template has a total of six main sections; AWSTemplateFormatVersion, Description, Parameters, Mappings, Resources, Outputs.   Of these six template sections only “Resources” is required.  However it is always a good idea to have other sections like Description or Parameters. Each AWS resource has numerous resource type identifiers that are used to extend functionality of the particular resource.

Breaking Down a CloudFormation Template

Here is a simple CloudFormation template provided by AWS.  It creates a single EC2 instance:

CloudFormation2

This template uses the Description, Parameters, Resources, and Outputs template sections.  The Description section is just a short description of what the template does. In this case it says the template will, “Create an EC2 instance running the Amazon Linux 32 bit AMI.”  The next section, the Parameters section is allowing the creation of a string value called KeyPair that can be passed to the stack at time of launch.  During stack launch from the console you would see the following dialogue box where you specify all of the editable parameters for that specific launch of the template, in this case there is only one parameter named KeyPair:

CloudFormation3

Notice how the KeyPair Parameter is available for you to enter a string, as well as the description that was also provided of what you should type in the box, “The EC2 Key Pair to allow SSH access to the instance”.  This would be an existing KeyPair in the us-east-1 region that you would use to access the instance once it’s launched.

Next, in the Resources section, the name “Ec2Instance” is defined as the name of the resource and then given the AWS Resource Type “AWS::EC2::Instance”.  The AWS Resource Type defines the type of AWS resource that the template will be deploying at launch and allows you to configure properties for that particular resource.  In this example only KeyName and ImageID are being used for this AWS resource.  For the AWS Resource type “AWS::EC2::Instance“ there are several additional properties you can use in CloudFormation, you can see the full list here.  Digging deeper we see the KeyName value is a reference to the parameter KeyPair that we defined in the Parameters section of the template, thus allowing the instance that the template creates to use the key pair that we defined at launch.  Next, the ImageId is ami-3b355a52 which is an Amazon Linux 32 bit AMI in the us-east-1 region, and why we have to specify a key that exists in the that region.

Finally, there is an Outputs template section which allows you to return values to the console describing the specific resources that were created. In this example the only output defined is “InstanceID”, which is given both a description, “The InstanceId of the newly created EC2 instance”, and a value, { “Ref” : “Ec2Instance” }, which is a reference to the resource that was created.  As you can see in the picture below, the stack launched successfully and the instance id i-5362512b was created.

CloudFormation4

The Outputs section is especially useful for complex templates because it allows you to summarize in one location all of the pertinent information for your deployed stack.  For example if you deployed dozens of machines in a complex SharePoint farm, you could use the outputs section of the template to just show the public facing endpoint, helping you quickly identify the relevant information to get into the environment.

CloudFormation for Disaster Recovery

The fact that CloudFormation templates construct an AWS environment in a consistent and repeatable fashion make them the perfect tool for Disaster Recovery (DR).  By configuring a CloudFormation template to contain all of you production resources you can deploy the same set of resources in another AWS Availability Zone or another Region entirely.  Thus, if one set of resources became unavailable in a disaster scenario, a quick launch of a CloudFormation template would initialize a whole new stack of production ready components.  Built an environment manually through the console and still want to take advantage of CloudFormation for DR? You can use the CloudFormer tool.  CloudFormer helps you construct a CloudFormation template from existing AWS resources.  You can find more information here.  No matter how you construct your CloudFormation template, the final result will be the same, a complete copy of your AWS environment in the form of JSON formatted document that can be deployed over and over.

Benefits of CloudFormation

The previous example is a very simple illustration of a CloudFormation template on AWS.

Here are some highlights:

  1. With a CloudFormation template you can create identical copies of your resources repeatedly, limiting the complex deployment tasks of sometimes several hundred clicks in the console.
  2. All CloudFormation templates are simple JSON structured files that allow you to easily share them and work with them using your current source control processes and favorite editing tools.
  3. CloudFormation templates can start simple and build over time to allow the most complex environments to be repeatedly deployed.  Thus, making them a great tool for DR.
  4. CloudFormation allows you to customize the AWS resources it deploys through use of Parameters that are editable during runtime of the template. For example if you are deploying an auto scaling group of ec2 instances within a VPC it is possible to have a Parameter that allows the creator to select which size of instance will be used for the creation of the stack.
  5. It can be argued, but the best part about CloudFormation is it’s free!

-Derek Baltazar, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss