1-888-317-7920 info@2ndwatch.com

Decision Points: Moving Enterprise Workloads to the Cloud

When you’re ready to move to the cloud, it’s truly a transformational time.  Determining your cloud strategy before moving too quickly is paramount.  It is important to make the hard and big decisions first.  You will be in the cloud for many years to come.  This can be a time to also remove years of technical debt.  After all, you want to migrate your workloads and not lift and shift your technical debt along with it.  At the same time, you do not want to experience “analysis paralysis” with all the decisions to be made.  Ultimately, you can have the speed, agility and cross-organizational support while providing the proper governance and guardrails.

Determining your migration strategy ahead of time is important for security, change management, and cost containment.  The promise of the cloud is great.  You might want to allow people to build and development environments at will.  You have smart and capable people.  They need help to quickly deploy.  Shadow IT results when innovative people are constrained from experimenting.  And often, the intention is that it will be temporary.  However, temporary quickly becomes permanent and undocumented without compliance.  The decision points listed in this article are important.  This is by no means a compressive list, as other items will likely reveal themselves during the process.

Decisions for Enterprise Cloud Migration – the Business of the Cloud:

  • Discovery – Can you get an accurate list of the application inventory?  What operating systems are in use?  Are all the applications still relevant or can they be retired?  What are the applications that have dependencies on other applications?  It may be hard to get this list together.  Some of these applications are likely many years old.  This can be time consuming and will help identify the true scope and cost estimates of moving to the cloud.  In many cases, third party discovery tools can aid in the discovery.
  • Vision and Education – Are the teams infighting and holding territory?  This can be related to understanding the cloud as well as they can.  It can be scary as all transformations are.  Y2K, client/server and the Internet revolution were scary as well.  We survived.  Plans for education to create awareness and capabilities will help.  There are also many misconceptions about the cloud in top management that will probably need to be addressed.

Strategic Decisions for the Cloud:

  • Which cloud providers are you going to use?  Clearly, Amazon Web Services is the leading cloud service provider.  However, a multi-cloud strategy may be important to the company as well.  How are you going to interconnect the cloud providers?
  • What account strategy will you use?  Will applications get their own account?  Or will accounts be aligned by business unit?  There are many different approaches to account strategy.  It will be hard to undue, so it is important to weigh the pros and cons of each strategy to account for billing, security and isolation.
  • What will your networking strategy be for networking in the cloud?  Will you use non-overlapping subnets managed with your on premise IP management?  Will you isolate production and non-production environments to separated block ranges in VPCs?  Or will you allow your migrated applications to only be accessed over the public internet instead of VPN?  It could also be a combination of these strategies.  There are many variables that will need to be identified to determine the best strategy.
  • Will you integrate on premise Identity management systems with your cloud infrastructure?  Active directory is common technology in most enterprises.  Will you extend your current AD architecture?  What changes need to be made to make it optimal for the cloud?

Decisions for Cost, Security and Compliance:

  • How will you tag your cloud assets?  Will it account for billing, security, and compliance?  Getting this right early on will allow for automation to enforce compliance and monitor for violations.
  • How will you manage the cloud costs?  Will you allow developers to provision their own instances?  What will your Reserved Instance strategy be?  How often does it need to be reviewed?  Costs in the cloud can spin out of control if proper guardrails are not established.  Scheduled power on and power off of environments is also another important strategy to further reduce costs.
  • What technologies are approved for cloud deployments?  Will your organization create approved images?  How will they be managed and updated?  Maybe your organization has approved base software that must be installed.  How will you maintain this configuration?  Configuration management and image baking are important processes to identify and define.
  • How will the cloud assets be continuously monitored for compliance?  Once a violation is found, how will it be remediated, with automation or manually?  Between AWS Config, CloudTrail and Tagging strategies, much of this task can be accomplished with automation.  However, there still needs to be individuals that review and update the process.
  • How will you secure your cloud environments?  WAF, anti-virus, IDS/IPS, and firewalls are just part of the overall security solution.  How will you control egress traffic as well?  How will you isolate your applications from each other and control user access?  We all know security is hard and requires constant care.  Find the right balance between real threats while still providing agility are important.
  • Will you secure data at rest?  Will you use built in AWS services for encryption keys, KMS or CloudHSM?  Or will you bring your own keys?  How will you provide column or row based encryption of your databases?  Cloud solutions need to be analyzed against the company standards to determine if you can use built in cloud encryption or decide to roll your own.
  • How will you provision your certificates for data in transit?  AWS provides the Certificate Manager service to provision SSL certificates.  Or will you continue to use your existing provider?  How will you track expiring certificates and update them?  AWS has many features for SSL including integration with their Elastic Load Balancers.
  • How will you manage your big data?  Will you scale up or out?  Are the workloads transient?  There are many options for cost optimization.  Between Spot Instances and automation, incredibility elegant solutions can be created.
  • What are your Disaster Recovery policies?  Do they need to be adjusted for the cloud?  Most likely they do.  How will you deliver your DR solutions?  Again, there are many solutions for DR in the cloud from infrastructure as code and creating automation to critical data and servers between regions.
  • What are your data retention policies?  How will you implement them in the cloud?  How will you ensure that you have met your regulatory compliance?  There are built-in solutions for data life cycles in AWS, but in many cases, it is more complicated than what is available off the shelf.
  • How will you handle OS and application licensing?  Will you use on-demand or bring your own licenses?  There is no one right answer.  ROIs needs to be calculated in many cases.
  • What is your single-sign-on (SSO) solution?  How will it integrate into the cloud?  AWS does provide federated authentication all of its services.

This is a long list of questions.  It isn’t intended to scare you away from the cloud, but rather to embrace it correctly.  No two enterprises are identical, but most share many of the same challenges.  Starting with this list of questions may help you identify many of the successful approaches to a migration journey to the much-promised benefits of the cloud.

-Ian Willoughby, Principal Architect


AWS Lambda Scheduled Event Function Deep Dive: Part 4

Registering the Lambda Function

Now that we’ve created the Lambda function IAM role, for the sake of simplicity, let’s assume the function itself is already written and packaged sitting in an S3 bucket that your Lambda function Role will have access to.  For this example, let’s assume our S3 URL for this backup function is: s3://my-awesome-lambda-functions/ec2_backup_manager.zip

The IAM policy rights required for creating the Lambda function and the Scheduled Event are:

  • lambda:CreateFunction
  • lambda:AddPermission
  • events:PutRule
  • events:PutTargets

Registering the Lambda Function and Scheduled Event via the AWS Lambda API using Python and boto3

Note: Like the previous boto3 example, you must either have your AWS credentials defined in a supported location (e.g. ENV variables, ~/.boto, ~/.aws/configuration, EC2 meta-data) or you must specify credentials when creating your boto3 client (or alternatively ‘session’).  The User/Role associated with the AWS credentials must also have the necessary rights, defined by policy, to perform the required operations against the AWS Lambda API.

A few notes on the creation_function function arguments
  • The Runtime is the language and version being used by our function (i.e. python2.7)
  • The Role is the ARN of the role we created in the previous exercise
  • The Handler is the function within your code that Lambda calls to begin execution. For our python example the value is: {lambda_function_name}.{handler_function_name}
  • The Code is either a base64 encoded zip file or the s3 bucket and key
An important Warning about IAM role replication delay

In the previous step we created an IAM role that we reference in the code below when creating our Lambda function.  Since IAM is an AWS region independent service it takes some time – usually less than a minute – for create/update/delete actions against the IAM API to replicate to all AWS regions.  This means that when we perform the create_function operation in the script below, we may very well receive an error from the Lambda API stating that the role we specified does not exist.  This is especially true when scripting an operation where the create_function operation happens only milliseconds after the create_role function.  Since there is really no way of querying the Lambda API to see if the role is available in the region yet prior to creating the function, the best option is to use exception handling to catch the specific error where the role does not yet exist and wrap that exception handler in a retry loop with an exponential back-off algorithm (though sleeping for 10-15 seconds will work just fine too).

Let’s pick up in the python script where we previously left off in the last example:


Registering the Lambda Function and Scheduled Event using AWS CloudFormation Template

Note: The S3Bucket MUST exist in the same region you are launching your CFT stack in.  To support multi-region templates I generally will have a Lambda S3 bucket for each Lambda region and append a .{REGION_NAME} suffix to the bucket name (e.g. my-awesome-lambda-functions.us-west-2).  Since CloudFormation provides us with a psuedo-parameter of the region you are launching the stack in (AWS::Region), you can utilize that to ensure you are referencing the appropriate bucket (see my example below).

The following block of JSON can be used in conjunction with our previous CloudFormation snippet by being added to the template’s “Resource” section to create the Lambda function, CloudWatch Event, and Input:


If you implement your Lambda functions using either of the two examples provided you should be able to reliably create, , manage, automate and scale them to whatever extent and whatever schedule you need.  And the best part is you will only be charged for the ACTUAL compute time you use, in 100ms increments.

Now go have fun, automate everything you possibly can, and save your organization thousands of dollars in wasted compute costs!  Oh, and conserve a bunch of energy while you’re at it.

-Ryan Kennedy, Sr Cloud Consultant


AWS Lambda Scheduled Event Function Deep Dive: Part 3

Creating the Lambda Function IAM Role

In our last article, we looked at how to set up scheduled events using either the API (python and boto3) or CloudFormation, including the required Trusted Entity Policy Document and IAM Role. This role and policy can be created manually using the AWS web console (not recommended), scripted using the IAM API (e.g. Python and boto3), or using a templating tool (e.g. CloudFormation. Hashicorp’s Terraform).  For this exercise we will cover both the scripted and the template tool approaches.

The IAM policy rights required for creating the Lambda function role and policy are:

  • iam:CreateRole
  • iam:PutRolePolicy

Creating the Lambda IAM role via the IAM API using Python and boto3

Note: For the following example you must either have your AWS credentials defined in a supported location (e.g. ENV variables, ~/.boto, ~/.aws/configuration, EC2 meta-data) or you must specify credentials when creating your boto3 client (or alternatively ‘session’).  The User/Role associated with the AWS credentials must also have the necessary rights, defined by policy, to perform the required operations against AWS IAM API.

The following python script will produce our desired IAM role and policy:


That will create the necessary lambda function role and its inline policy.

Creating the Lambda IAM role using AWS CloudFormation Template

The following block of JSON can be added to a CloudFormation template’s “Resource” section to create the Lambda function role and its inline policy:


Visit us next week for the final segment of this blog series – Registering the Lambda Function.

-Ryan Kennedy, Sr Cloud Consultant


AWS Lambda Scheduled Event Function Deep Dive: Part 2

Down to the “Nitty-Gritty”

Now that we have an understanding of how AWS Lambda scheduled events can be expressed, we can dive into a real-world scenario and examine how to set that up using either the API (python and boto3) or CloudFormation.  Because what fun would it be doing it in the web console after all?  And this is a deep-dive, so using the web console would be a distasteful choice anyway.  Also…automation.  Enough said.

Suffice to say, creating the scheduled event for your function in the AWS web console can be done quite easily by selecting “Scheduled Event” from the “Event source type” drop-down list and defining your expression.

The Use Case

Let’s assume we’ve written a nice little Lambda function that will search the EC2 API and find all of our instances running in a region, or multiple regions, and manage EBS snapshot backups for all EBS volumes on any instances with a specific tag.

We could hard-code our parameters in the Lambda function or derive them a number of ways, but let’s assume we’ve done the right thing and have specified them in the Scheduled Event Input parameter.  You might say, “Yeah, but I can just derive the current AWS region during the Lambda function execution, so why bother even providing that input?”  To which I would say:

Lambda is only currently available in the us-east-1, us-west-2, ap-northeast-1, eu-west-1, and eu-central-1 regions. What if you want to manage EC2 backups in ap-southeast-2 or another region where lambda isn’t yet available?

Defining our Inputs

backup_tag (dictionary): A single key/value pair used by the backup function to identify which EC2 instance (any instances with a matching tag) to manage backups on.
Ex. { “Key”: “Environment”, “Value”: “Prod” }

regions (list of strings): A list of region(s) to manage EC2 backups against.
Ex. [ “us-west-2”, “us-west-1”, “us-east-1” ]

support_email (string): An email address to send backup reports, alerts, etc. to.
Ex. “backup.admin@2ndwatch.com”

Those three inputs would be captured as a JSON string like so:


Lambda Function IAM Role Requirement

Lambda functions require an IAM Role be specified at time of creation.  The Role must have the lambda service added as a “Trusted Entity” so that it can assume the Role.  The Trusted Relationship Policy Document (called AssumeRolePolicyDocument in CloudFormation) should look like this:


In addition to the Trusted Entity Policy Document, the Role should have a policy, inline or managed, assigned to it that will allow the Lambda function all of the access it needs to AWS resources and APIs (e.g. EC2 describe instances, create snapshots, delete snapshots).  For our use-case the following policy is a good place to start:


Come back later this week for part 3 of this blog series – Creating the Lambda Function IAM Role.

-Ryan Kennedy, Sr Cloud Consultant


AWS Lambda Scheduled Event Function Deep Dive: Part 1

My colleague, Ryan Manikowski, recently wrote a great blog giving an overview of AWS Lambda.  His article mentioned that AWS Lambda Functions can be triggered by a number of potential events and services.  In this 4-part blog series I am going to cover the Lambda “Scheduled Event” in-depth.

FYI, here is the official AWS documentation on using Scheduled Events to drive Lambda functions.


Lambda was originally designed to be an “event driven” service, meaning that it was reactionary in nature.  Lambda functions were intended to be written to handle some event (e.g. SNS notification, Kinesis, DynamoDB) and perform some unit of work based on that event.  But what if you want to use Lambda in a non-reactionary way?  What if you want to invoke your Lambda function at some regular interval to perform whatever task you see fit?  At AWS re:Invent in October 2015, AWS announced (at the same time they announced Python function support. Woohoo!) that they were adding support for Scheduled Events to Lambda.

So you can now create Lambda Scheduled Events, via CloudWatch Events Schedule Rules, using either a fixed rate or a cron expression to define your schedule.  Pretty sweet.

Some Technical Details

Schedule Expressions

As I just mentioned above, you can use either of two methods to define a schedule for invoking a Lambda function – rate and cron expressions.

Rate Expressions

The rate expression uses the form: rate(Value Unit)

  • The Value must be a positive integer, and the Unit must be either minute(s), hour(s), or day(s).
  • The rate frequency must be at least 1 minute.
  • A singular value (e.g. 1) must use the singular tense of the unit and, likewise, plural values (e.g. 2 or more) must use the plural tense of the unit.

ex: rate(1 hour) would trigger a Lambda function every 1 hour (perpetually)
ex: rate(5 minutes) would trigger a function every 5 minutes (perpetually)

Cron Expressions

The cron expression uses the form: cron(Minutes Hours Day-of-month Month Day-of-week Year)

  • All time is referenced against UTC.
  • All fields are required.
  • One of the day-of-month or day-of-week values must be a question mark (?)

ex: cron(0/15 * * * ? *) would trigger a Lambda function at 0, 15, 30, and 45 minutes past the hour, every hour of every day.
ex: cron(0 23 ? * MON-FRI *) would trigger a Lambda function at 11:00PM UTC Monday through Friday.


If you need your function to execute at very specific points in time (e.g. every 15 minutes starting at the top of the hour), use the cron schedule expression.  The rate schedule expression will start when you create the Scheduled Event rule and then run at the rate defined thereafter.  Meaning if you used rate(15 minutes) as your schedule expression and create that Scheduled Event at 9:53AM, it would start at 9:53 and then kick-off every 15 minutes after that (10:08, 10:23, 10:38, …).  That may or may not be an issue depending on your use-case.

Complete detailed information and specifications on scheduled expressions can be found here.

Input data (i.e. parameters)

Along with the rate expression, a scheduled event can also provide a Lambda function with Input data (in a JSON formatted string), which will be handed to the event handler function as a data object (a dictionary in the case of python).  This can be useful if you need to pass your function parameters or arguments, as opposed to “hard-coding” values in your function.  I’ll include an example of this below.  While supplying Input data is not necessarily required, it is something that can and should be used in a number of applications.

Check back next week for part 2 of this blog series – how to set up scheduled events.

-Ryan Kennedy, Sr Cloud Consultant


Migrating to AWS NAT Gateway

Now that AWS has released the new NAT Gateway, you’ll need a plan to migrate off your old legacy NAT (too soon?).  The migration isn’t terribly difficult, but following this guide will help provide an outage-free transition to the new service.


Your NAT Gateway migration plan should start with an audit.  If you take a look at our blog post on NAT Gateway Considerations you’ll see that there are a number of “gotchas” that need to be taken into account before you begin.  Here what you’re looking for:

  1. Are your NAT EIPs whitelisted to external vendors? If yes, you’ll need to take a longer outage window for the migration.
  2. Does your NAT perform other functions for scripts or access? If you’re using the NAT for account scripts or bastion access, you’ll need to migrate those scripts and endpoints to another instance.
  3. Do you have inbound NAT security group rules? If yes, you’ll lose this security during the migration and will need to transfer these rules to the Outbound of the origination security groups.
  4. Do you need high availability across AZs? If yes, you’ll need more than one NAT Gateway.
  5. Is your NAT in an auto scaling group (ASG)? If yes, you’ll need to remove the ASG to clean up.

Check your Routes

Next you’ll want to take a look at your various routes in your private subnets and map them to the NAT instances that are referenced.  Unless you are changing your configuration, you can note the public subnet that the NAT currently exists in and use that one for your NAT Gateway.  That’s not required, but it introduces the least amount of change in your environment.

Image 1

(Optional) Disassociate EIP from NAT Instance

In the case where the EIP of your NAT is whitelisted to third party providers, you’ll need to remove it from the NAT instance prior to the creation of the replacement gateway.  NOTE: removing the NAT will begin your “downtime” for the migration, so you’ll want to understand the impact and know if a maintenance window is appropriate.  When you disassociate the EIP, denote the EIP allocation id because you’ll need it later.

Find the EIP that is currently attached to the NAT instance and Right Click > Disassociate

Image 2

Deploy the NAT Gateway(s)

At this point, you should have the EIP allocation id and public subnet id for the NAT that you intend to replace.  If you aren’t moving your NAT EIP, you can generate one during the creation of the NAT Gateway.  Click VPC Service and Click NAT Gateways on the left side.  Then click Create NAT Gateway.

Image 3

Select the public subnet and EIP allocation id or create a new EIP.  Then click Create a NAT Gateway.

Update Routes

Once you’ve created the NAT Gateway, you’ll be prompted to update your route tables.  Click Edit Route Tables.

Image 4

At this point, you’ll want to go through the route tables that reference the NAT instance you replaced and edit the route to reference the NAT Gateway instead.  Unsurprisingly, NAT Gateway ids start with “nat”.

Image 5

You’ll repeat this process for every NAT instance and subnet that you’ll be migrating.

Verify Access

Log into at least one instance in each private subnet and verify connectivity based on what is allowed.  On a Linux box, running “curl icanhazip.com” will return the external IP address and quickly confirm that you’re connected to the Internet, and the reply should match the EIP attached to your NAT Gateway.


Once you’ve migrated to the NAT Gateway and verified everything is working correctly, you’ll likely want to schedule the decommissioning of the NAT instances.  If the instances aren’t in an ASG, you can stop the instances and set a calendar entry to terminate them a safe time in the future.  If they are in an ASG, you’ll need to set the minimum and maximum to 0 and let the ASG terminate the instance.

-Coin Graham, Senior Cloud Engineer