The jump to the cloud can be a scary proposition. For an enterprise with systems deeply embedded in traditional infrastructure like back office computer rooms and datacenters the move to the cloud can be daunting. The thought of having all of your data in someone else’s hands can make some IT admins cringe. However, once you start looking into cloud technologies you start seeing some of the great benefits, especially with providers like Amazon Web Services (AWS). The cloud can be cost-effective, elastic and scalable, flexible, and secure. That same IT admin cringing at the thought of their data in someone else’s hands may finally realize that AWS is a bit more secure than a computer rack sitting under an employee’s desk in a remote office. Once the decision is finally made to “try out” the cloud, the planning phase can begin.
Most of the time the biggest question is, “How do we start with the cloud?” The answer is to use a phased approach. By picking applications and workloads that are less mission critical, you can try the newest cloud technologies with less risk. When deciding which workloads to move, you should ask yourself the following questions; Is there a business need for moving this workload to the cloud? Is the technology a natural fit for the cloud? What impact will this have on the business? If all those questions are suitably answered, your workloads will be successful in the cloud.
One great place to start is with archiving and backups. These types of workloads are important, but the data you’re dealing with is likely just a copy of data you already have, so it is considerably less risky. The easiest way to start with archives and backups is to try out S3 and Glacier. Many of today’s backup utilities you may already be using, like Symantec Netbackup and Veeam Backup & Replication, have cloud versions that can directly backup to AWS. This allows you to use start using the cloud without changing much of your embedded backup processes. By moving less critical workloads you are taking the first steps in increasing your cloud footprint.
Now that you have moved your backups to AWS using S3 and Glacier, what’s next? The next logical step would be to try some of the other services AWS offers. Another workload that can often be moved to the cloud is Disaster Recovery. DR is an area that will allow you to more AWS services like VPC, EC2, EBS, RDS, Route53 and ELBs. DR is a perfect way to increase your cloud footprint because it will allow you to construct your current environment, which you should already be very familiar with, in the cloud. A Pilot Light DR solution is one type of DR solution commonly seen in AWS. In the Pilot Light scenario the DR site has minimal systems and resources with the core elements already configured to enable rapid recovery once a disaster happens. To build a Pilot Light DR solution you would create the AWS network infrastructure (VPC), deploy the core AWS building blocks needed for the minimal Pilot Light configuration (EC2, EBS, RDS, and ELBs), and determine the process for recovery (Route53). When it is time for recovery all the other components can be quickly provisioned to give you a fully working environment. By moving DR to the cloud you’ve increased your cloud footprint even more and are on your way to cloud domination!
The next logical step is to move Test and Dev environments into the cloud. Here you can get creative with the way you use the AWS technologies. When building systems on AWS make sure to follow the Architecting Best Practices: Designing for failure means nothing will fail, decouple your components, take advantage of elasticity, build security into every layer, think parallel, and don’t fear constraints! Start with proof-of-concept (POC) to the development environment, and use AWS reference architecture to aid in the learning and planning process. Next your legacy application in the new environment and migrate data. The POC is not complete until you validate that it works and performance is to your expectations. Once you get to this point, you can reevaluate the build and optimize it to exact specifications needed. Finally, you’re one step closer to deploying actual production workloads to the cloud!
Production workloads are obviously the most important, but with the phased approach you’ve taken to increase your cloud footprint, it’s not that far of a jump from the other workloads you now have running in AWS. Some of the important things to remember to be successful with AWS include being aware of the rapid pace of the technology (this includes improved services and price drops), that security is your responsibility as well as Amazon’s, and that there isn’t a one-size-fits-all solution. Lastly, all workloads you implement in the cloud should still have stringent security and comprehensive monitoring as you would on any of your on-premises systems.
Overall, a phased approach is a great way to start using AWS. Start with simple services and traditional workloads that have a natural fit for AWS (e.g. backups and archiving). Next, start to explore other AWS services by building out environments that are familiar to you (e.g. DR). Finally, experiment with POCs and the entire gambit of AWS to benefit for more efficient production operations. Like many new technologies it takes time for adoption. By increasing your cloud footprint over time you can set expectations for cloud technologies in your enterprise and make it a more comfortable proposition for all.
-Derek Baltazar, Senior Cloud Engineer
If you haven’t heard of Amazon Glacier, you need to check it out. As its name implies, you can think of Glacier as “frozen” storage. When considering the speed of EBS and S3, Glacier by comparison moves glacially slow. Consider Glacier as essentially a cloud-based archival solution that works similarly to old-style tape backup. In the past, backups first ran to tape, then were stored locally in case of immediate access requirements, and were then taken off-site once a certain date requirement was met (once a week, once a month, etc.). Glacier essentially works as the last stage of that process.
When a snapshot in S3, for instance, gets to be a month old, you can instruct AWS to automatically move that object to Glacier. Writing it to Glacier happens pretty much immediately, though being able to see that object on your Glacier management console can take between 3-5 hours. If you need it back, you’ll issue a request, but that can take up to 24 hours to be resolved. Amazon hasn’t released the exact mechanics of how they’re storing the data on their end, but large tape libraries are a good bet since they jive with one of Glacier’s best features: its price. That’s only $0.01 per gigabyte. Its second best feature is 11 nines worth of “durability” (which refers to data loss) and 4 nines worth of “reliability” (which refers to data availability). That’s 99.999999999% for those who like the visual.
Configuring Glacier, while a straightforward process, will require some technical savvy on your part. Amazon has done a nice job of representing how Glacier works in an illustration:
As you can see, the first step is to download the Glacier software development kit (SDK), which is available for Java or .NET. Once you’ve got that, you’ll need to create your vault. This is an easy step that starts with accessing your Glacier management console, selecting your service region (Glacier is automatically redundant across availability zones in your region, which is part of the reason for its high durability rating), naming your vault, and hitting the create button. I’m using the sandbox environment that comes with your AWS account to take these screen shots, so the region is pre-selected. In a live environment, this would be a drop-down menu providing you with region options.
The vault is where you’ll store your objects, which equate to a single file, like a document or a photo. But instead of proceeding directly to vault creation from the screen above, be sure and set up your vault’s Amazon Simple Notification Service (SNS) parameters.
Notifications can be created for a variety of operations and delivered to systems managers or applications using whatever protocol you need (HTML for a homegrown web control or email for your sys admin, for example). Once you create the vault from the notifications screen, you’re in your basic Glacier management console:
Uploading and downloading documents is where it gets technical. Currently, the web-based console above doesn’t have tools for managing archive operations like you’d find with S3. Uploading, downloading, deleting or any other operation will require programming in whichever language for which you’ve downloaded the SDK. You can use the AWS Identity and Access Management (IAM) service to attach user permissions to vaults and manage billing through your Account interface, but everything else happens at the code level. However, there are third-party Glacier consoles out there that can handle much of the development stuff in the background while presenting you with a much simpler management interface, such as CloudBerry Explorer 3.6. We’re not going to run through code samples here, but Amazon has plenty of resources for this off its Sample Code & Libraries site.
On the upside, while programming for Glacier operations is difficult for non-programmers, if you’ve got the skills, it provides a lot of flexibility in designing your own archive and backup processes. You can assign vaults to any of the various backup operations being run by your business and define your own archive schedules. Essentially, that means you can configure a hierarchical storage management (HSM) architecture that natively incorporates AWS.
For example, imagine a typical server farm running in EC2. At the first tier, it’s using EBS for immediate, current data transactions, similar to a hard disk or SAN LUN. When files in your EBS store have been unused for a period of time or if you’ve scheduled them to move at a recurring time (like with server snapshots), those files can be automatically moved to S3. Access between your EC2 servers and S3 isn’t quite as fast as EBS, but it’s still a nearline return on data requests. Once those files have lived on S3 for a time, you can give them a time to live (TTL) parameter after which they are automatically archived on Glacier. It’ll take some programming work, but unlike with standard on-premises archival solutions, which are usually based on a proprietary architecture, using Java or .NET means you can configure your storage management any way you like – for different geographic locations, different departments, different applications, or even different kinds of data.
And this kind of HSM design doesn’t have to be entirely cloud-based. Glacier works just as well with on-premises data, applications, or server management. There is no minimum or maximum amount of data you can archive with Glacier, though individual archives can’t be less than 1 byte or larger than 40 terabytes. To help you observe regulatory compliance issues, Glacier uses secure protocols for data transfer and encrypts all data on the server side using key management and 256-bit encryption.
Pricing is extremely low and simple to calculate. Data stored in Glacier is $0.01 per gigabyte. Upload and retrieval operations run only $0.05 per 1000 requests, and there is a pro-rated charge of $0.03 per gigabyte if you delete objects prior to 90 days of storage. Like everything else in AWS, Glacier is a powerful solution that provides highly customizable functionality for which you only pay for what you use. This service is definitely worth a very close look.
-Caleb Carter, Solutions Architect