IT infrastructure is the hardware, network, services and software required for enterprise IT. It is the foundation that enables organizations to deliver IT services to their users. Disaster recovery (DR) is preparing for and recovering from natural and people-related disasters that impact IT infrastructure for critical business functions. Natural disasters include earthquakes, fires, etc. People-related disasters include human error, terrorism, etc. Business continuity differs from DR as it involves keeping all aspects of the organization functioning, not just IT infrastructure.
When planning for DR, companies must establish a recovery time objective (RTO) and recovery point objective (RPO) for each critical IT service. RTO is the acceptable amount of time in which an IT service must be restored. RPO is the acceptable amount of data loss measured in time. Companies establish both RTOs and RPOs to mitigate financial and other types of loss to the business. Companies then design and implement DR plans to effectively and efficiently recover the IT infrastructure necessary to run critical business functions.
For companies with corporate datacenters, the traditional approach to DR involves duplicating IT infrastructure at a secondary location to ensure available capacity in a disaster. The key downside is IT infrastructure must be bought, installed and maintained in advance to address anticipated capacity requirements. This often causes IT infrastructure in the secondary location to be over-procured and under-utilized. In contrast, Amazon Web Services (AWS) provides companies with access to enterprise-grade IT infrastructure that can be scaled up or down for DR as needed.
The four most common DR architectures on AWS are:
- Backup and Restore ($) – Companies can use their current backup software to replicate data into AWS. Companies use Amazon S3 for short-term archiving and Amazon Glacier for long-term archiving. In the event of a disaster, data can be made available on AWS infrastructure or restored from the cloud back onto an on-premise server.
- Pilot Light ($$) – While backup and restore are focused on data, pilot light includes applications. Companies only provision core infrastructure needed for critical applications. When disaster strikes, Amazon Machine Images (AMIs) and other automation services are used to quickly provision the remaining environment for production.
- Warm Standby ($$$) – Taking the Pilot Light model one step further, warm standby creates an active/passive cluster. The minimum amount of capacity is provisioned in AWS. When needed, the environment rapidly scales up to meet full production demands. Companies receive (near) 100% uptime and (near) no downtime.
- Hot Standby ($$$$) – Hot standby is an active/active cluster with both cloud and on-premise components to it. Using weighted DNS load-balancing, IT determines how much application traffic to process in-house and on AWS. If a disaster or spike in load occurs, more or all of it can be routed to AWS with auto-scaling.
In a non-disaster environment, warm standby DR is not scaled for full production, but is fully functional. To help adsorb/justify cost, companies can use the DR site for non-production work, such as quality assurance, ing, etc. For hot standby DR, cost is determined by how much production traffic is handled by AWS in normal operation. In the recovery phase, companies only pay for what they use in addition and for the duration the DR site is at full scale. In hot standby, companies can further reduce the costs of their “always on” AWS servers with Reserved Instances (RIs).
Smart companies know disaster is not a matter of if, but when. According to a study done by the University of Oregon, every dollar spent on hazard mitigation, including DR, saves companies four dollars in recovery and response costs. In addition to cost savings, smart companies also view DR as critical to their survival. For example, 51% of companies that experienced a major data loss closed within two years (Source: Gartner), and 44% of companies that experienced a major fire never re-opened (Source: EBM). Again, disaster is not a ready of if, but when. Be ready.
-Josh Lowry, General Manager – West
By now, most people know the benefits of the cloud – elasticity, flexibility, scalability, on-demand resources. But how does moving from traditional on-premise IT infrastructure to the cloud affect your business financials? For a good overview, check out “The Financial Impact of Cloud.”
In migrating customers to AWS one of the consistent questions we are asked is, “How do we extend our backup services into the Cloud?” My answer? You don’t. This is often met with incredulous stares where the customer is wondering if I’m joking, crazy, or I just don’t understand IT. After all, backups are fundamental to data centers as well as IT systems in general, so why on Earth would I tell someone not to backup their systems?
The short answer to backups is just not to do it, honestly. The more in depth answer is, of course, more complicated than that. To be clear, I am talking about system backups; those backups typically used for bare metal restores. Backups of databases, of file services – these we’ll tackle separately. For the bulk of systems, however, we’ll leave backups as a relic of on premise data centers.
How? Why? Consider a typical three tiered architecture: web servers, application servers, and database servers. In AWS, ideally your application and web servers are stateless, auto scaled systems. With that in mind, why would you ever want to spend time, money, or resources on backing up and restoring one of these systems? The design should be set so if and when a system fails, the health check/monitoring automatically terminates the instance, which in turn automatically creates an auto scale event to launch a new instance in its place. No painfully long hours working through a restore process.
Similarly, your database systems can work without large scale backup systems. Yes, by all means run database backups! Database backups are not for server instance failures but for database application corruption or updates/upgrade rollbacks. Unfortunately, the Cloud doesn’t magically make your databases any more immune to human error. For the database servers (assuming non-RDS), however, maintaining a snapshot of the server instance is likely good enough for backups. If and when the database server fails, the instance can be terminated and the standby system can become the live system to maintain system integrity. Launch a new database server based on the snapshot, restore the database and/or configure replication from the live system, depending on database technology, and you’re live.
So yes, in a properly configured AWS environment, the backup and restore you love to loathe from your on premise environment is a thing of the past.
-Keith Homewood, Cloud Architect
Wired Innovation Insights published a blog article written by our own Chris Nolan yesterday. Chris discusses ways you can save money on your AWS cloud deployment in “How to Manage Your Amazon Cloud Deployment to Save Money.” Chris’ top tips incude:
- Use CloudFormation or other configuration and orchestration tool.
- Watch out for cloud sprawl.
- Use AWS auto scaling.
- Turn the lights off when you leave the room.
- Use tools to monitor spend.
- Build in redundancy.
- Planning saves money.
Read the Full Article
Moving to the cloud is one of the best decisions you can make for your business. The low startup costs, instant elasticity, and near endless scalability have lured many organizations from traditional datacenters to the cloud. Although cloud startup costs are extremely low, over time the burgeoning use of resources within an AWS account can slowly increase the cost of operating in the cloud.
One service AWS provides to help with watching the costs of an AWS environment is AWS Trusted Advisor. AWS touts the service as “your customized cloud expert” that helps you monitor resources according to best practices. Trusted Advisor is a service that runs in the background of your AWS account and gathers information regarding cost optimization, security, fault tolerance, and performance. Trusted Advisor can be accessed proactively through the support console or can be setup to notify you via weekly email.
The types of Trusted Advisor notifications available for Cost Optimization are Amazon EC2 Reserved Instance Optimization, Low Utilization Amazon EC2 Instances, Idle Load Balancers, Underutilized Amazon EBS Volumes, Unassociated Elastic IP Addresses, and RDS Idle DB Instances. Within these service types, Trusted Advisor gives you four types of possible statuses; “No problems detected,” “Investigation recommended,” “Action recommended,” and “Not available.” Each one of these status types give insight to how effectively you are running your account based on the best-practice algorithm the service uses. In the below example, AWS Trusted Advisor points out $1,892 of potential savings for this account.
Each one of these notifications adds up to the total potential monthly savings. Here is one “Investigation recommended” notification from the same account. It says “3 of 4 DB instances appear to be Idle. Monthly savings of up to $101 are available by minimizing idle DB Instances.”
Clicking the drop down button reveals more:
The full display tells you exactly what resource in your account is causing the alert and even gives you the estimated monthly savings if you were to make changes to the resource. In this case the three RDS instances are running in Oregon and Ireland. This particular service is basing the alert on the “Days since last connection,” which is extremely helpful because if there have been no connections to the database in 14+ days, there’s a good chance it’s not even being used. One of the best things about Trusted Advisor is that it gives the overview broken down by service type and gives just enough information to be simple and useful. We didn’t have to login to RDS and navigate to the Oregon region or the Ireland region to find this information. It was all gathered by Trusteed Advisor and presented in an easy to read format. Remember, not all of the information provided may need immediate attention, but it’s nice to have it readily available. Another great feature is each notification can be downloaded as a Microsoft Excel spreadsheet that allows you to have even more control over the data the service provides.
Armed with the Trusted Advisor tool you can keep a closer eye on your AWS resources and gain insight to optimizing costs on a regular basis. The Trusted Advisor covers the major AWS services but is only available to accounts with Business or Enterprise-level support. Overall, it’s a very useful service for watching account costs and keeping an eye on possible red flags on an account. It definitely doesn’t take the place of diligent implementation and monitoring of resources by a cloud engineer but can help with the process.
– Derek Baltazar, Senior Cloud Engineer
Cloud has been positioned as the panacea for all of the ills in the world. The hype often leaves experienced IT professionals skeptical of the benefits of leveraging public clouds. The most common objection to public cloud I hear is, “Wait until you get your first bill from (public cloud).” I can understand the fear and consternation of suddenly having a huge monthly bill that no one budgeted. However, there are great 3rd party tools available that allow you to forecast and track your budget and usage, like 2W Insight, which make that uncomfortable situation avoidable.
With the recent price cuts by Amazon Web Services, Microsoft and Google, the TCO argument from the traditional hardware manufacturers is losing steam. When is the last time any major hardware vendor cut its prices to its customers by 40-60% overnight? There is no better time to begin evaluating and leveraging the cloud, and a great place to start is with a TCO analysis.
2nd Watch has a services practice that can assist your organization in building an accurate monthly forecast of your estimated cloud costs while also providing a full TCO analysis.
-Matt Whitney, Cloud Sales Executive