1-888-317-7920 info@2ndwatch.com

DR is Dead

Having been in the IT Industry since the 90s I’ve seen many iterations on Disaster Recovery principals and methodologies.  The concept of DR of course far exceeds my tenure in the field as the idea started coming about in the 1970s as businesses began to realize their dependence on information systems and the criticality of those services.

Over the past decade or so we’ve really seen the concept of running a DR site at a colo facility (either leased or owned) become a popular way for organizations to have a rapidly available disaster recovery option.  The problem with a colo facility is that it is EXPENSIVE!  In addition to potentially huge CapEx (if you are buying your own infrastructure), you have the facility and infrastructure OpEx and all the overhead expense of managing those systems and everything that comes along with that.  In steps the cloud… AWS and the other players in the public cloud arena provide you the ability to run a DR site without having really any CapEx.  Now you are only paying for the virtual infrastructure that you are actually using as an operational cost.

An intelligently designed DR solution could leverage something like Amazon’s Pilot Light to keep your costs reduced by running the absolute minimal core infrastructure needed to keep the DR site fully ready to scale up to production.  Well that is a big improvement over purchasing millions of dollars of hardware and having thousands and thousands of dollars in OpEx and overhead costs every month.  Even still… there is a better way.  If you architect your infrastructure and applications following the AWS best practices, then in a perfect world there is really no reason to have DR at all.  By architecting your systems to balance across multiple AWS regions and availability zones; correctly designing architecture and applications for handling unpredictable and cascading failure; and to automatically and elastically scale to meet increases and decreases in demand you can effectively eliminate the need for DR.  Your data and infrastructure are distributed in a way that is highly available and impervious to failure or spikes/drops in demand.  So in addition to inherent DR, you are getting HA and true capacity-on-demand.  The whole concept of a disaster taking down a data center and the subsequent effects on your systems, applications, and users becomes irrelevant.  It may take a bit of work to design (or redesign) an application to this new cloud geo-distributed model, but I assure you that from a business continuity perspective, reduced TCO, scalability, and uptime it will pay off in spades.

That ought to put the proverbially nail in the coffin. RIP.

-Ryan Kennedy, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss

An Introduction to CloudFormation

An Introduction to CloudFormation

One of the most powerful services in the AWS collection set is CloudFormation. It provides the ability to programmatically construct and manage a grouping of AWS resources in a predictable way.  With CloudFormation, provisioning of an AWS environment does not have to be done through single CLI commands or by clicking through the console, but can be completed through a JSON (Javascript Object Notation) formatted text file, or CloudFormation template.  With CloudFormation, you can build a few or several AWS resources into an environment automatically.  CloudFormation works with several AWS resource types; from AWS network infrastructure (VPCs, Subnets, Routing Tables, Gateways, and Network ACLs), to compute (EC2 and Auto Scaling), to database (RDS and ElastiCache), to storage (S3) components.  You can see the full list here.

The general JSON structure looks like the following:

CloudFormation1

A template has a total of six main sections; AWSTemplateFormatVersion, Description, Parameters, Mappings, Resources, Outputs.   Of these six template sections only “Resources” is required.  However it is always a good idea to have other sections like Description or Parameters. Each AWS resource has numerous resource type identifiers that are used to extend functionality of the particular resource.

Breaking Down a CloudFormation Template

Here is a simple CloudFormation template provided by AWS.  It creates a single EC2 instance:

CloudFormation2

This template uses the Description, Parameters, Resources, and Outputs template sections.  The Description section is just a short description of what the template does. In this case it says the template will, “Create an EC2 instance running the Amazon Linux 32 bit AMI.”  The next section, the Parameters section is allowing the creation of a string value called KeyPair that can be passed to the stack at time of launch.  During stack launch from the console you would see the following dialogue box where you specify all of the editable parameters for that specific launch of the template, in this case there is only one parameter named KeyPair:

CloudFormation3

Notice how the KeyPair Parameter is available for you to enter a string, as well as the description that was also provided of what you should type in the box, “The EC2 Key Pair to allow SSH access to the instance”.  This would be an existing KeyPair in the us-east-1 region that you would use to access the instance once it’s launched.

Next, in the Resources section, the name “Ec2Instance” is defined as the name of the resource and then given the AWS Resource Type “AWS::EC2::Instance”.  The AWS Resource Type defines the type of AWS resource that the template will be deploying at launch and allows you to configure properties for that particular resource.  In this example only KeyName and ImageID are being used for this AWS resource.  For the AWS Resource type “AWS::EC2::Instance“ there are several additional properties you can use in CloudFormation, you can see the full list here.  Digging deeper we see the KeyName value is a reference to the parameter KeyPair that we defined in the Parameters section of the template, thus allowing the instance that the template creates to use the key pair that we defined at launch.  Next, the ImageId is ami-3b355a52 which is an Amazon Linux 32 bit AMI in the us-east-1 region, and why we have to specify a key that exists in the that region.

Finally, there is an Outputs template section which allows you to return values to the console describing the specific resources that were created. In this example the only output defined is “InstanceID”, which is given both a description, “The InstanceId of the newly created EC2 instance”, and a value, { “Ref” : “Ec2Instance” }, which is a reference to the resource that was created.  As you can see in the picture below, the stack launched successfully and the instance id i-5362512b was created.

CloudFormation4

The Outputs section is especially useful for complex templates because it allows you to summarize in one location all of the pertinent information for your deployed stack.  For example if you deployed dozens of machines in a complex SharePoint farm, you could use the outputs section of the template to just show the public facing endpoint, helping you quickly identify the relevant information to get into the environment.

CloudFormation for Disaster Recovery

The fact that CloudFormation templates construct an AWS environment in a consistent and repeatable fashion make them the perfect tool for Disaster Recovery (DR).  By configuring a CloudFormation template to contain all of you production resources you can deploy the same set of resources in another AWS Availability Zone or another Region entirely.  Thus, if one set of resources became unavailable in a disaster scenario, a quick launch of a CloudFormation template would initialize a whole new stack of production ready components.  Built an environment manually through the console and still want to take advantage of CloudFormation for DR? You can use the CloudFormer tool.  CloudFormer helps you construct a CloudFormation template from existing AWS resources.  You can find more information here.  No matter how you construct your CloudFormation template, the final result will be the same, a complete copy of your AWS environment in the form of JSON formatted document that can be deployed over and over.

Benefits of CloudFormation

The previous example is a very simple illustration of a CloudFormation template on AWS.

Here are some highlights:

  1. With a CloudFormation template you can create identical copies of your resources repeatedly, limiting the complex deployment tasks of sometimes several hundred clicks in the console.
  2. All CloudFormation templates are simple JSON structured files that allow you to easily share them and work with them using your current source control processes and favorite editing tools.
  3. CloudFormation templates can start simple and build over time to allow the most complex environments to be repeatedly deployed.  Thus, making them a great tool for DR.
  4. CloudFormation allows you to customize the AWS resources it deploys through use of Parameters that are editable during runtime of the template. For example if you are deploying an auto scaling group of ec2 instances within a VPC it is possible to have a Parameter that allows the creator to select which size of instance will be used for the creation of the stack.
  5. It can be argued, but the best part about CloudFormation is it’s free!

We are here to help our customers, so if you need help developing a cloud-first strategy, contact us here.

-Derek Baltazar, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss

AWS Sticky Sessions

Amazon Web Services best practices tell us to build for stateless systems, in a perfect world any server can serve any function with absolutely no impact to customers.  Sounds great, but unfortunately reality interjects into our perfect world and we find many websites and applications are not so perfectly stateless.  So how can we make use of the strengths of AWS in areas like elasticity and auto scaling without completely re-writing applications to conform?  After all, one of the key benefits to moving into the Cloud is cost savings which get eaten away by spending development resources rewriting code.

The solution is thankfully built-in to Amazon’s Elastic Load Balancer (ELB), so those that require sessions to remain open for a customer can enable that “sticky” option.  This keeps transactions processing, real time communication alive, and businesses from needing to redesign such code or give up auto scaling.  So how does it work?

The first option is to create duration-based session stickiness.  This is enabled at the ELB under port configuration.  From there, the “stickiness” option can be enabled, and the ELB will generate a session cookie with a limited duration (default is 60 seconds).  So long as the client checks in with the ELB before the cookie expires, the session is held on that instance and that instance will not be terminated by auto scaling.  The second option is to enable application-controlled stickiness.  This requires more development effort unless the existing platform already makes use of custom cookies; however this gives far more control to application developers than a basic number of seconds before timeout.  By using application control a web developer can keep a client connection directed to a specific instance through the ELB with no fear that a required instance will be terminated prematurely.

-Keith Homewood, Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

Disaster Recovery on Amazon Web Services

After a seven-year career at Cisco, I am thrilled to be working with 2nd Watch to help make cloud work for our customers.  Disaster recovery is a great use case for companies looking for a quick cloud win.  Traditional on-premise backup and disaster recovery technologies are complex and can require major capital investments.  I have seen many emails from IT management to senior management explaining the risk to their business if they do not spend money on backup/DR.  The associated costs with on-premise solutions for DR are often put off to next year’s budget and seem to always get cut midway throughout the year.  When the business does invest in backup/DR, there are countless corners cut in order to maximize the reliability and performance with a limited budget.

The public cloud is a great resource for addressing the requirements of backup and disaster recovery.  Organizations can avoid the sunk costs of data center infrastructure (power, cooling, flooring, etc.) while having all of the performance and resources available for their growing needs.  Atlanta based What’s Up Interactive saved over $1 million with their AWS DR solution (case study here).

I will highlight a few of the top benefits any company can expect when leveraging AWS for their disaster recovery project.

1. Eliminate costly tape backups that require transporting, storing, and retrieving the tape media.  This is replaced by fast disk-based storage that provides the performance needed to run your mission-critical applications.

2.  AWS provides an elastic solution that can scale to the growth of data or application requirements without the costly capital expenses of traditional technology vendors.   Companies can also expire and delete archived data according to organizational policy.  This allows companies to pay for only what they use.

3.  AWS provides a secure platform that helps companies meet compliance requirements due to easy access to data for deadlines that is secure and durable.

Every day we help new customers leverage the cloud to support their business.  Our team of Solutions Architects and Cloud Engineers can assist you in creating a plan to reduce risk in your current backup/DR solution.  Let us know how we can help you get started in your journey to the cloud.

-Matt Whitney, Cloud Sales Executive

Facebooktwittergoogle_pluslinkedinmailrss

AWS Auto-Scaling

Auto-Scaling gives the ability to scale your EC2 instances up or down according to demand to handle the load on the service.  With auto-scaling you don’t have to worry about whether or not the number of instances you’re using will be able to handle a demand spike or if you’re overspending during a slower period. Auto-scaling automatically scales for you for seamless performance.

For instance, if there are currently 3 m1.xlarge instances handling the service, and they spend a large portion of their time only 20% loaded with a smaller portion of their time heavily loaded, they can be vertically scaled down to smaller instance sizes and horizontally scaled out/in to more or less instances to automatically accommodate whatever load they have at that time.  This can also save many dollars by only paying for the smaller instance size.  More savings can be attained by using reserved instance billing for the minimum number of instances defined by the Auto-Scaling configuration and letting those scaled out instances pay the on-demand rate while running.  This is a little tricky though because an instance billing cannot be changed while the instance is running.  When scaling down, make sure to terminate the newest instances, since they are running at the on-demand billing rate.

Vertical Scaling is typically referred to as scale-up or scale-down by changing the instance size, while Horizontal Scaling is typically referred to as scale-out or scale-in by changing the number of instances.

When traffic on AWS Service has predictable or unpredictable increases or decreases, Auto-Scaling can keep customers happy with the service because their response times stay more consistent and High Availability is more reliable.

Auto-Scaling to Improve HA

If there is only one server instance, Auto-scaling can be used to put a new server in place, in a few minutes, when the running one fails.  Just set both Min and Max number of instances to 1.

Auto-Scaling to Improve Response Time Consistency

If there are multiple servers and the load on them becomes so heavy that the response time slows, expand horizontally only for the time necessary to cover the extra load, and keep the response time low.

AWS Auto-Scaling Options to Set

When Auto-Scaling up or down, there are a lot of things to think about:

  • Evaluation Period is the time, in seconds, between checks of the load on the Scaling Group.
  • Cool Down is the time, in seconds, after a scaling operation that a new scaling operation can be performed.  When scaling out, this time should be fairly short in the event that the load is too heavy for one Scale-Up operation. When scaling in, this time should be at least twice that of the Scale-Out operation.
  • With Scale-Out, make sure it scales fast enough to quickly handle a load heavier than one expansion. 300 seconds is a good starting point.
  • With Scale-In, make sure it scales slow enough to not keep going out and in.  We call this “Flapping”. Some call it “Thrashing”.
  • When the Auto-Scale Group includes multiple AZs, Scaling out and in should be incremented by the number of AZs involved. If only one AZ is scaled up and something happens to that AZ, noticeability in a bad way goes up.
  • Scale-In can be accomplished by different rules:
  1. Terminate Oldest Instance
  2. Terminate Newest Instance
  3. Terminate Instance Closest to the next Instance Hour (Best Cost Savings)
  4. Terminate Oldest Launch Configuration (default)

Auto-Scaling Examples

Auto-Scaling is a two stage process, and here is the rub.  The AWS Management Console does not do Auto-Scaling so it has to be done through AWS APIs.

  1. Set up the Launch Configuration and assign it to a group of instances you want to control.  If there is no user_data file that argument can be left out.  The block-device-mapping argument can be found in the details for the ami_id.
    • # as-create-launch-config <auto_scaling_launch_config_name> –region <region_name> –image-id <AMI_ID> –instance-type <type> –key <SSH_key_pair_name> –group <VPC_security_group_ID> –monitoring-enabled –user-data-file=<path_and_name_for_user_data_file> –block-device-mapping “<device_name>=<snap_id>:100:true:standard”
    • # as-create-auto-scaling-group <auto_scaling_group_name> –region <region_name> –launch-configuration <auto_scaling_launch_config_name> –vpc-zone-identifier <VPC_Subnet_ID>,<VPC_Subnet_ID> –availability-zones <Availability_Zone>,<Availability_Zone> –load-balancers <load_balancer_name> –min-size <min_number_of_instances_that_must_be_running> –max-size <max_number_of_instances_that_can_be_running> –health-check-type ELB –grace-period <time_seconds_before_first_check> –tag “k=Name, v=<friendly_name>, p=true”
  2. Have CloudWatch initiate Scaling Activities.  One CloudWatch Alert for Scaling Out and one for Scaling In.  Also send notifications when scaling.
  • Scale Out (Alarm Actions output from first command are used by second command argument)
  • # as-put-scaling-policy –name <auto_scaling_policy_name_for_high_CPU> –region <region_name> –auto-scaling-group <auto_scaling_group_name> –adjustment <Number_of_instances_to_change_by> –type ChangeInCapacity –cooldown <time_in_seconds_to_wait_to_check_after_adding_instances>
  • # mon-put-metric-alarm –alarm-name <alarm_name_for_high_CPU> –region <region_name>  –metric-name CPUUtilization –namespace AWS/EC2 –statistic Average –period <number_of_seconds_to_check_each_time_period> –evaluation-periods <number_of_periods_between_checks> –threshold <percent_number> –unit Percent –comparison-operator GreaterThanThreshold –alarm-description <description_use_alarm_name> –dimensions “AutoScalingGroupName=<auto_scaling_group_name>” –alarm-actions <arn_string_from_last_command>
  • Scale In(Alarm Actions output from first command used as second command argument)
  • # as-put-scaling-policy –name <auto_scaling_policy_name_for_low_CPU> –region <region_name> –auto-scaling-group <auto_scaling_group_name> “–adjustment=-<Number_of_instances_to_change_by> ” –type ChangeInCapacity –cooldown <time_in_seconds_to_wait_to_check_after_removing_instances>
  • # mon-put-metric-alarm –alarm-name <alarm_name_for_low_CPU> –region <region_name> –metric-name CPUUtilization –namespace AWS/EC2 –statistic Average –period <number_of_seconds_to_check_each_time_period>  –evaluation-periods <number_of_periods_between_checks>  –threshold <percent_number> –unit Percent –comparison-operator LessThanThreshold –alarm-description <description_use_alarm_name> –dimensions “AutoScalingGroupName=<auto_scaling_group_name>” –alarm-actions <arn_string_from_last_command>

AMI Changes Require Auto-Scaling Updates

The instance configuration could change for any number of reasons:

  • Security Patches
  • New Features added
  • Removal of un-used Old Features

Whenever the AMI specified in the Auto-Scaling definition is changed, the Auto-Scaling Group needs to be updated.  The update requires creating a new Scaling Launch Config with the new AMI ID, updating the Auto-Scaling Group, then deleting the old Scaling Launch Config.  Without this update the Scale out operation will use the old AMI.

1. Create new Launch Config:

# as-create-launch-config <new_auto_scaling_launch_config_name> –region <region_name> –image-id <AMI_ID> –instance-type <type> –key <SSH_key_pair_name> –group <VPC_security_group_ID> –monitoring-enabled –user-data-file=<path_and_name_for_user_data_file> –block-device-mapping “<device_name>=<snap_id>:100:true:standard”

2. Update Auto Scaling Group:

# as-update-auto-scaling-group  <auto_scaling_group_name> –region <region_name> –launch-configuration <new_auto_scaling_launch_config_name> –vpc-zone-identifier <VPC_Subnet_ID>,<VPC_Subnet_ID> –availability-zones <Availability_Zone>,<Availability_Zone> –min-size <min_number_of_instances_that_must_be_running> –max-size <max_number_of_instances_that_can_be_running> –health-check-type ELB –grace-period <time_seconds_before_first_check>

3. Delete Old Auto-Scaling Group:

as-delete-launch-config <old_auto_scaling_launch_config_name> –region <region_name> –force

Now all Scale Outs should use the updated AMI.

-Charles Keagle, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss

Distributed Functional Testing on AWS

To leverage the full benefits of Amazon Web Services (AWS) and features such as instant elasticity and scalability, every AWS architect eventually considers Elastic Load Balancing and Auto Scaling.   These features enable the ability to instantly scale-in or scale-out an environment based on the flow of internet traffic.

Once implemented, how do you the configuration and application to make sure they’re scaling with the parameters you’ve set?  You could always trust the design and logic, then wait for the environment to scale naturally with organic traffic.  However, in most production environments this is not an option. You want to make sure the environment operates adequately under load.  One cool way to do this is by generating a distributed traffic load through a program called Bees with Machine Guns.

The author describes Bees with Machine Guns as “A utility for arming (creating) many bees (micro EC2 instances) to attack (load ) targets (web applications).”  This is a perfect solution for ing performance and functionality of an AWS environment because it allows you to use one master controller to call many bees for a distributed attack on an application.  Using a distributed attack from several bees gives a more realistic attack profile that you can’t get from a single node.  Bees with Machine Guns enables you to mount an attack with one or several bees with the same amount of effort.

Bees with Machine Guns isn’t just a randomly found open source tool. AWS endorses the project in several places on their website.  AWS recommends Bees with Machine Guns for distributed ing in their article “Best Practices in Evaluating Elastic Load Balancing”.  The author says “…you could consider tools that help you distribute s, such as the open source Fabric framework combined with an interesting approach called Bees with Machine Guns, which uses the Amazon EC2 environment for launching clients that execute s and report the results back to a controller.”  AWS also provides a CloudFormation template for deploying Bees with Machine Guns on their AWS CloudFormation Sample Templates page.

To install Bees with Machine Guns you can either use the template provided on the AWS CloudFormation Sample Templates page called bees-with-machineguns.template or follow the install instructions from the GitHub project page. (Please be aware the template also deploys a scalable spot instance auto scale group behind an elastic load balancer, all of which you are responsible to pay for.)

Once the Bees with Machine Guns source is installed. You have the ability to run the following commands:

Bees_Machine_Guns_1
The first command we run will start up five bees that we will have control over for ing.  We can use the –s option to specify the number of bees we want to spin up.  The –k option is the SSH key pair name used to connect to the new servers.  The –I option is the name of the AMI used for each bee.  The –g option is the security group in which the bees will be launched.  If the key pair, security group, and instance already exist in the region you’re launching the bees, there is less chance you will see errors when running the command.

Bees_Machine_Guns2

Once launched, you can see the bees that were instantiated and under control of the Bees with Machine Guns controller with the command:

Bees_Machine_Guns3

To make our bees attack we use the command “bees attack”.  The options used are -u which is the URL of the target to attack.  Make sure to use the trailing backslash in your URL or the command will error out.   The –n is the total number of connection to make to the target.  The –c option is used for the number of concurrent connections made to the target.  Here in as example run of an attack:

Bees_Machine_Guns4

Notice that the attack was distributed among the bees in the following manner “Each of 5 bees will fire 20 rounds, 2 at a time.” Since we had our total number of connections set to 100 each bee received an equal share of the request.  Depending on your choices for the –n and –c options you can configure a different type of attack profile.  For example, if you wanted to increase the time of an attack you would increase the total number of connections and the bees would take longer to complete the attack.  This comes in useful when ing an auto scale group in AWS because you can configure an attack that will trigger one of your cloud watch alarms which will in turn activate a scaling action. Another trick is to use the Linux “time” command before your “bees attack” command, once the attack completes you can see the total duration of the attack.

Once the command completes you get output for the number of requests that actually completed, the requests that were made per second, the time per request, and a “Mission Assessment,” in this case the “Target crushed bee offensive”.

To spin down your fleet of bees you run the command:

Bees_Machine_Guns5

This is a quick intro on how to use Bees with Machine Guns for distributed ing within AWS. The one big caution in using Bees with Machine Guns, as explained by the author, “they are, more-or-less a distributed denial-of-service attack in a fancy package,” which means you should only use it against resources that you own, and you will be liable for any unauthorized use.

As you can see, Bees with Machine Guns can be a powerful tool for distributed load s.  It’s extremely easy to setup and tremendously easy to use.  It is a great way to artificially create a production load to the elasticity and scalability of your AWS environment.

-Derek Baltazar, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss