1-888-317-7920 info@2ndwatch.com

Batch Computing in the Cloud with Amazon SQS & SWF

Batch computing isn’t necessarily the most difficult thing to design a solution around, but there are a lot of moving parts to manage, and building in elasticity to handle fluctuations in demand certainly cranks up the complexity.  It might not be particularly exciting, but it is one of those things that almost every business has to deal with in some form or another.

The on-demand and ephemeral nature of the Cloud makes batch computing a pretty logical use of the technology, but how do you best architect a solution that will take care of this?  Thankfully, AWS has a number of services geared towards just that.  Amazon SQS (Simple Queue Services) and SWF (Simple Workflow Service) are both very good tools to assist in managing batch processing jobs in the Cloud.  Elastic Transcoder is another tool that is geared specifically around transcoding media files.  If your workload is geared more towards analytics and processing petabyte scale big data, then tools like EMR (Elastic Map Reduce) and Kinesis could be right up your alley (we’ll cover that in another blog).  In addition to not having to manage any of the infrastructure these services ride on, you also benefit from the streamlined integration with other AWS services like IAM for access control, S3, SNS, DynamoDB, etc.

For this article, we’re going to take a closer look at using SQS and SWF to handle typical batch computing demands.

Simple Queue Services (SQS), as the name suggests, is relatively simple.  It provides a queuing system that allows you to reliably populate and consume queues of data.  Queued items in SQS are called messages and are either a string, number, or binary value.  Messages are variable in size but can be no larger than 256KB (at the time of this writing).  If you need to queue data/messages larger than 256KB in size the best practice is to store the data elsewhere (e.g. S3, DynamoDB, Redis, MySQL) and use the message data field as a linker to the actual data.  Messages are stored redundantly by the SQS service, providing fault tolerance and guaranteed delivery.  SQS doesn’t guarantee delivery order or that a message will be delivered only once, which seems like something that could be problematic except that it provides something called Visibility Timeout that ensures once a message has been retrieved it will not be resent for a given period of time.  You (well, your application really) have to tell SQS when you have consumed a message and issue a delete on that message.  The important thing is to make sure you are doing this within the Visibility Timeout, otherwise you may end up processing single messages multiple times.  The reasoning behind not just deleting a message once it has been read from the queue is that SQS has no visibility into your application and whether the message was actually processed completely, or even just successfully read for that matter.

Where SQS is designed to be data-centric and remove the burden of managing a queuing application and infrastructure, Simple Workflow Service (SWF) takes it a step further and allows you to better manage the entire workflow around the data.  While SWF implies simplicity in its name, it is a bit more complex than SQS (though that added complexity buys you a lot).  With SQS you are responsible for managing the state of your workflow and processing of the messages in the queue, but with SWF, the workflow state and much of its management is abstracted away from the infrastructure and application you have to manage.  The initiators, workers, and deciders have to interface with the SWF API to trigger state changes, but the state and logical flow are all stored and managed on the backend by SWF.  SWF is quite flexible too in that you can use it to work with AWS infrastructure, other public and private cloud providers, or even traditional on-premise infrastructure.  SWF supports both sequential and parallel processing of workflow tasks.

Note: if you are familiar with or are already using JMS, you may be interested to know SQS provides a JMS interface through its java messaging library.

One major thing SWF buys you over using SQS is that the execution state of the entire workflow is stored by SWF extracted from the initiators, workers, and deciders.  So not only do you not have to concern yourself with maintaining the workflow execution state, it is completely abstracted away from your infrastructure.  This makes the SWF architecture highly scalable in nature and inherently very fault-tolerant.

There are a number of good SWF examples and use-cases available on the web.  The SWF Developer Guide uses a classic e-commerce customer order workflow (i.e. place order, process payment, ship order, record completed order).  The SWF console also has a built in demo workflow that processes an image and converts it to either grayscale or sepia (requires AWS account login).  Either of these are good examples to walk through to gain a better understanding of how SWF is designed to work.

Contact 2nd Watch today to get started with your batch computing workloads in the cloud.

-Ryan Kennedy, Sr. Cloud Architect

Facebooktwittergoogle_pluslinkedinmailrss

Gartner Critical Capabilities for Public Cloud IaaS

Gartner_logo_svgLearn from Gartner what the Critical Capabilities for Public Cloud Infrastructure as a Service were for 2014.  Gartner evaluates 15 public cloud IaaS service providers, listed in the 2014 Magic Quadrant, against eight critical capabilities and for four common use cases your enterprise manages today.

Gartner takes an in-depth look at the critical capabilities for:

  • Application Development – for the needs of large teams of developers building new applications
  • Batch Computing – including high-performance computing (HPC), data analytics and other one-time (but potentially recurring), short-term, large-scale, scale-out workloads
  • Cloud Native Applications – for applications at any scale, which have been written with the strengths and weaknesses of public cloud IaaS in mind
  • General Business Applications – for applications not designed with the cloud in mind, but that can run comfortably in virtualized environments
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.
Facebooktwittergoogle_pluslinkedinmailrss