1-888-317-7920 info@2ndwatch.com

Backups – Don’t Do It

In migrating customers to AWS one of the consistent questions we are asked is, “How do we extend our backup services into the Cloud?”  My answer?  You don’t.  This is often met with incredulous stares where the customer is wondering if I’m joking, crazy, or I just don’t understand IT.  After all, backups are fundamental to data centers as well as IT systems in general, so why on Earth would I tell someone not to backup their systems?

The short answer to backups is just not to do it, honestly.  The more in depth answer is, of course, more complicated than that.  To be clear, I am talking about system backups; those backups typically used for bare metal restores.  Backups of databases, of file services – these we’ll tackle separately.  For the bulk of systems, however, we’ll leave backups as a relic of on premise data centers.

How?  Why?  Consider a typical three tiered architecture: web servers, application servers, and database servers.  In AWS, ideally your application and web servers are stateless, auto scaled systems.  With that in mind, why would you ever want to spend time, money, or resources on backing up and restoring one of these systems?  The design should be set so if and when a system fails, the health check/monitoring automatically terminates the instance, which in turn automatically creates an auto scale event to launch a new instance in its place.  No painfully long hours working through a restore process.

Similarly, your database systems can work without large scale backup systems.  Yes, by all means run database backups!  Database backups are not for server instance failures but for database application corruption or updates/upgrade rollbacks. Unfortunately, the Cloud doesn’t magically make your databases any more immune to human error.  For the database servers (assuming non-RDS), however, maintaining a snapshot of the server instance is likely good enough for backups.  If and when the database server fails, the instance can be terminated and the standby system can become the live system to maintain system integrity.  Launch a new database server based on the snapshot, restore the database and/or configure replication from the live system, depending on database technology, and you’re live.

So yes, in a properly configured AWS environment, the backup and restore you love to loathe from your on premise environment is a thing of the past.

-Keith Homewood, Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

Managing Your Amazon Cloud Deployment to Save Money

Wired Innovation Insights published a blog article written by our own Chris Nolan yesterday. Chris discusses ways you can save money on your AWS cloud deployment in “How to Manage Your Amazon Cloud Deployment to Save Money.” Chris’ top tips incude:

  1. Use CloudFormation or other configuration and orchestration  tool.
  2. Watch out for cloud sprawl.
  3. Use AWS auto scaling.
  4. Turn the lights off when you leave the room.
  5. Use tools to monitor spend.
  6. Build in redundancy.
  7. Planning saves money.

Read the Full Article

Facebooktwittergoogle_pluslinkedinmailrss

Enabling Growth and Watching the Price

One of the main differentiators between traditional on premise data centers and Cloud Computing through AWS is the speed at which businesses can scale their environment.  So often in enterprise environments, IT and business struggle to have adequate capacity when they need it.  Facilities run out of power and cooling, vendors cannot provide systems fast enough or the same type of system is not available, and business needs sometimes come without warning.  AWS scales out to meet these demands in every area.

Compute capacity is expanded, often automatically with auto scaling groups, which add additional server instances as demands dictate.  With auto scaling groups, demands on the environment cause more systems to come online.  Even without auto scaling, systems can be cloned with Amazon Machine Images (AMIs) and started to meet capacity, expand to a new region/geography, or even be shared with a business partner to move collaboration forward.

Beyond compute capacity, storage capacity is a few mouse clicks (or less) away from business needs as well.  Using Amazon S3, storage capacity is simply allocated as it is used dynamically.  Customers do not need to do anything more than add content and storage, and that is far easier than adding disk arrays!  With Elastic Block Storage (EBS), these are added as quickly as compute instances are.  Storage can be added and attached to live instances or replicated across an environment as capacity is demanded.

Growth is great, and we’ve written a great deal about how to take advantage of the elastic nature of AWS before, but what about the second part of the title?  Price!  It’s no secret that as customers use more AWS resources, the price increases.  The more you use, the more you pay; simple.  The differentiators come into play with that same elastic nature; when demand drops, resources can be released and costs saved.  Auto scaling can retire instances as easily as it adds them, storage can be removed when no longer needed, and with usage of resources, bills can actually shrink as you become more proficient in AWS.  (Of course, 2ndWatch Managed Services can also help with that proficiency!)  With traditional data centers, once resources are purchased, you pay the price (often a large one). With the Cloud, resources can be purchased as needed, at just a fraction of the price.

IT wins and business wins – enterprise level computing at its best!

-Keith Homewood, Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

AWS Auto-Scaling

Auto-Scaling gives the ability to scale your EC2 instances up or down according to demand to handle the load on the service.  With auto-scaling you don’t have to worry about whether or not the number of instances you’re using will be able to handle a demand spike or if you’re overspending during a slower period. Auto-scaling automatically scales for you for seamless performance.

For instance, if there are currently 3 m1.xlarge instances handling the service, and they spend a large portion of their time only 20% loaded with a smaller portion of their time heavily loaded, they can be vertically scaled down to smaller instance sizes and horizontally scaled out/in to more or less instances to automatically accommodate whatever load they have at that time.  This can also save many dollars by only paying for the smaller instance size.  More savings can be attained by using reserved instance billing for the minimum number of instances defined by the Auto-Scaling configuration and letting those scaled out instances pay the on-demand rate while running.  This is a little tricky though because an instance billing cannot be changed while the instance is running.  When scaling down, make sure to terminate the newest instances, since they are running at the on-demand billing rate.

Vertical Scaling is typically referred to as scale-up or scale-down by changing the instance size, while Horizontal Scaling is typically referred to as scale-out or scale-in by changing the number of instances.

When traffic on AWS Service has predictable or unpredictable increases or decreases, Auto-Scaling can keep customers happy with the service because their response times stay more consistent and High Availability is more reliable.

Auto-Scaling to Improve HA

If there is only one server instance, Auto-scaling can be used to put a new server in place, in a few minutes, when the running one fails.  Just set both Min and Max number of instances to 1.

Auto-Scaling to Improve Response Time Consistency

If there are multiple servers and the load on them becomes so heavy that the response time slows, expand horizontally only for the time necessary to cover the extra load, and keep the response time low.

AWS Auto-Scaling Options to Set

When Auto-Scaling up or down, there are a lot of things to think about:

  • Evaluation Period is the time, in seconds, between checks of the load on the Scaling Group.
  • Cool Down is the time, in seconds, after a scaling operation that a new scaling operation can be performed.  When scaling out, this time should be fairly short in the event that the load is too heavy for one Scale-Up operation. When scaling in, this time should be at least twice that of the Scale-Out operation.
  • With Scale-Out, make sure it scales fast enough to quickly handle a load heavier than one expansion. 300 seconds is a good starting point.
  • With Scale-In, make sure it scales slow enough to not keep going out and in.  We call this “Flapping”. Some call it “Thrashing”.
  • When the Auto-Scale Group includes multiple AZs, Scaling out and in should be incremented by the number of AZs involved. If only one AZ is scaled up and something happens to that AZ, noticeability in a bad way goes up.
  • Scale-In can be accomplished by different rules:
  1. Terminate Oldest Instance
  2. Terminate Newest Instance
  3. Terminate Instance Closest to the next Instance Hour (Best Cost Savings)
  4. Terminate Oldest Launch Configuration (default)

Auto-Scaling Examples

Auto-Scaling is a two stage process, and here is the rub.  The AWS Management Console does not do Auto-Scaling so it has to be done through AWS APIs.

  1. Set up the Launch Configuration and assign it to a group of instances you want to control.  If there is no user_data file that argument can be left out.  The block-device-mapping argument can be found in the details for the ami_id.
    • # as-create-launch-config <auto_scaling_launch_config_name> –region <region_name> –image-id <AMI_ID> –instance-type <type> –key <SSH_key_pair_name> –group <VPC_security_group_ID> –monitoring-enabled –user-data-file=<path_and_name_for_user_data_file> –block-device-mapping “<device_name>=<snap_id>:100:true:standard”
    • # as-create-auto-scaling-group <auto_scaling_group_name> –region <region_name> –launch-configuration <auto_scaling_launch_config_name> –vpc-zone-identifier <VPC_Subnet_ID>,<VPC_Subnet_ID> –availability-zones <Availability_Zone>,<Availability_Zone> –load-balancers <load_balancer_name> –min-size <min_number_of_instances_that_must_be_running> –max-size <max_number_of_instances_that_can_be_running> –health-check-type ELB –grace-period <time_seconds_before_first_check> –tag “k=Name, v=<friendly_name>, p=true”
  2. Have CloudWatch initiate Scaling Activities.  One CloudWatch Alert for Scaling Out and one for Scaling In.  Also send notifications when scaling.
  • Scale Out (Alarm Actions output from first command are used by second command argument)
  • # as-put-scaling-policy –name <auto_scaling_policy_name_for_high_CPU> –region <region_name> –auto-scaling-group <auto_scaling_group_name> –adjustment <Number_of_instances_to_change_by> –type ChangeInCapacity –cooldown <time_in_seconds_to_wait_to_check_after_adding_instances>
  • # mon-put-metric-alarm –alarm-name <alarm_name_for_high_CPU> –region <region_name>  –metric-name CPUUtilization –namespace AWS/EC2 –statistic Average –period <number_of_seconds_to_check_each_time_period> –evaluation-periods <number_of_periods_between_checks> –threshold <percent_number> –unit Percent –comparison-operator GreaterThanThreshold –alarm-description <description_use_alarm_name> –dimensions “AutoScalingGroupName=<auto_scaling_group_name>” –alarm-actions <arn_string_from_last_command>
  • Scale In(Alarm Actions output from first command used as second command argument)
  • # as-put-scaling-policy –name <auto_scaling_policy_name_for_low_CPU> –region <region_name> –auto-scaling-group <auto_scaling_group_name> “–adjustment=-<Number_of_instances_to_change_by> ” –type ChangeInCapacity –cooldown <time_in_seconds_to_wait_to_check_after_removing_instances>
  • # mon-put-metric-alarm –alarm-name <alarm_name_for_low_CPU> –region <region_name> –metric-name CPUUtilization –namespace AWS/EC2 –statistic Average –period <number_of_seconds_to_check_each_time_period>  –evaluation-periods <number_of_periods_between_checks>  –threshold <percent_number> –unit Percent –comparison-operator LessThanThreshold –alarm-description <description_use_alarm_name> –dimensions “AutoScalingGroupName=<auto_scaling_group_name>” –alarm-actions <arn_string_from_last_command>

AMI Changes Require Auto-Scaling Updates

The instance configuration could change for any number of reasons:

  • Security Patches
  • New Features added
  • Removal of un-used Old Features

Whenever the AMI specified in the Auto-Scaling definition is changed, the Auto-Scaling Group needs to be updated.  The update requires creating a new Scaling Launch Config with the new AMI ID, updating the Auto-Scaling Group, then deleting the old Scaling Launch Config.  Without this update the Scale out operation will use the old AMI.

1. Create new Launch Config:

# as-create-launch-config <new_auto_scaling_launch_config_name> –region <region_name> –image-id <AMI_ID> –instance-type <type> –key <SSH_key_pair_name> –group <VPC_security_group_ID> –monitoring-enabled –user-data-file=<path_and_name_for_user_data_file> –block-device-mapping “<device_name>=<snap_id>:100:true:standard”

2. Update Auto Scaling Group:

# as-update-auto-scaling-group  <auto_scaling_group_name> –region <region_name> –launch-configuration <new_auto_scaling_launch_config_name> –vpc-zone-identifier <VPC_Subnet_ID>,<VPC_Subnet_ID> –availability-zones <Availability_Zone>,<Availability_Zone> –min-size <min_number_of_instances_that_must_be_running> –max-size <max_number_of_instances_that_can_be_running> –health-check-type ELB –grace-period <time_seconds_before_first_check>

3. Delete Old Auto-Scaling Group:

as-delete-launch-config <old_auto_scaling_launch_config_name> –region <region_name> –force

Now all Scale Outs should use the updated AMI.

-Charles Keagle, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss

Distributed Functional Testing on AWS

To leverage the full benefits of Amazon Web Services (AWS) and features such as instant elasticity and scalability, every AWS architect eventually considers Elastic Load Balancing and Auto Scaling.   These features enable the ability to instantly scale-in or scale-out an environment based on the flow of internet traffic.

Once implemented, how do you the configuration and application to make sure they’re scaling with the parameters you’ve set?  You could always trust the design and logic, then wait for the environment to scale naturally with organic traffic.  However, in most production environments this is not an option. You want to make sure the environment operates adequately under load.  One cool way to do this is by generating a distributed traffic load through a program called Bees with Machine Guns.

The author describes Bees with Machine Guns as “A utility for arming (creating) many bees (micro EC2 instances) to attack (load ) targets (web applications).”  This is a perfect solution for ing performance and functionality of an AWS environment because it allows you to use one master controller to call many bees for a distributed attack on an application.  Using a distributed attack from several bees gives a more realistic attack profile that you can’t get from a single node.  Bees with Machine Guns enables you to mount an attack with one or several bees with the same amount of effort.

Bees with Machine Guns isn’t just a randomly found open source tool. AWS endorses the project in several places on their website.  AWS recommends Bees with Machine Guns for distributed ing in their article “Best Practices in Evaluating Elastic Load Balancing”.  The author says “…you could consider tools that help you distribute s, such as the open source Fabric framework combined with an interesting approach called Bees with Machine Guns, which uses the Amazon EC2 environment for launching clients that execute s and report the results back to a controller.”  AWS also provides a CloudFormation template for deploying Bees with Machine Guns on their AWS CloudFormation Sample Templates page.

To install Bees with Machine Guns you can either use the template provided on the AWS CloudFormation Sample Templates page called bees-with-machineguns.template or follow the install instructions from the GitHub project page. (Please be aware the template also deploys a scalable spot instance auto scale group behind an elastic load balancer, all of which you are responsible to pay for.)

Once the Bees with Machine Guns source is installed. You have the ability to run the following commands:

Bees_Machine_Guns_1
The first command we run will start up five bees that we will have control over for ing.  We can use the –s option to specify the number of bees we want to spin up.  The –k option is the SSH key pair name used to connect to the new servers.  The –I option is the name of the AMI used for each bee.  The –g option is the security group in which the bees will be launched.  If the key pair, security group, and instance already exist in the region you’re launching the bees, there is less chance you will see errors when running the command.

Bees_Machine_Guns2

Once launched, you can see the bees that were instantiated and under control of the Bees with Machine Guns controller with the command:

Bees_Machine_Guns3

To make our bees attack we use the command “bees attack”.  The options used are -u which is the URL of the target to attack.  Make sure to use the trailing backslash in your URL or the command will error out.   The –n is the total number of connection to make to the target.  The –c option is used for the number of concurrent connections made to the target.  Here in as example run of an attack:

Bees_Machine_Guns4

Notice that the attack was distributed among the bees in the following manner “Each of 5 bees will fire 20 rounds, 2 at a time.” Since we had our total number of connections set to 100 each bee received an equal share of the request.  Depending on your choices for the –n and –c options you can configure a different type of attack profile.  For example, if you wanted to increase the time of an attack you would increase the total number of connections and the bees would take longer to complete the attack.  This comes in useful when ing an auto scale group in AWS because you can configure an attack that will trigger one of your cloud watch alarms which will in turn activate a scaling action. Another trick is to use the Linux “time” command before your “bees attack” command, once the attack completes you can see the total duration of the attack.

Once the command completes you get output for the number of requests that actually completed, the requests that were made per second, the time per request, and a “Mission Assessment,” in this case the “Target crushed bee offensive”.

To spin down your fleet of bees you run the command:

Bees_Machine_Guns5

This is a quick intro on how to use Bees with Machine Guns for distributed ing within AWS. The one big caution in using Bees with Machine Guns, as explained by the author, “they are, more-or-less a distributed denial-of-service attack in a fancy package,” which means you should only use it against resources that you own, and you will be liable for any unauthorized use.

As you can see, Bees with Machine Guns can be a powerful tool for distributed load s.  It’s extremely easy to setup and tremendously easy to use.  It is a great way to artificially create a production load to the elasticity and scalability of your AWS environment.

-Derek Baltazar, Senior Cloud Engineer

Facebooktwittergoogle_pluslinkedinmailrss