1-888-317-7920 info@2ndwatch.com

CI/CD for Infrastructure as Code with Terraform and Atlantis

In this post, we’ll go over a complete workflow for continuous integration (CI) and continuous delivery (CD) for infrastructure as code (IaC) with just 2 tools: Terraform, and Atlantis.

What is Terraform?

So what is Terraform? According to the Terraform website:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

In practice, this means that Terraform allows you to declare what you want your infrastructure to look like – in any cloud provider – and will automatically determine the changes necessary to make it so. Because of its simple syntax and cross-cloud compatibility, it’s 2nd Watch’s choice for infrastructure as code.

Pain You May Be Experiencing Working With Terraform

When you have multiple collaborators (individ

uals, teams, etc.) working on a Terraform codebase, some common problems are likely to emerge:

  1. Enforcing peer review becomes difficult. In any codebase, you’ll want to ensure that your code is peer reviewed in order to ensure better quality in accordance with The Second Way of DevOps: Feedback. The role of peer review in IaC codebases is even more important. IaC is a powerful tool, but that tool has a double-edge – we are clearly more productive for using it, but that increased productivity also means that a simple typo could potentially cause a major issue with production infrastructure. In order to minimize the potential for bad code to be deployed, you should require peer review on all proposed changes to a codebase (e.g. GitHub Pull Requests with at least one reviewer required). Terraform’s open source offering has no facility to enforce this rule.
  1. Terraform plan output is not easily integrated in code reviews. In all code reviews, you must examine the source code to ensure that your standards are followed, that the code is readable, that it’s reasonably optimized, etc. In this aspect, reviewing Terraform code is like reviewing any other code. However, Terraform code has the unique requirement that you must also examine the effect the code change will have upon your infrastructure (i.e. you must also review the output of a terraform plan command). When you potentially have multiple feature branches in the review process, it becomes critical that you are assured that the terraform plan output is what will be executed when you run terraform apply. If the state of infrastructure changes between a run of terraform plan and a run of terraform apply, the effect of this difference in state could range from inconvenient (the apply fails) to catastrophic (a significant production outage). Terraform itself offers locking capabilities but does not provide an easy way to integrate locking into a peer review process in its open source product.
  1. Too many sets of privileged credentials. Highly-privileged credentials are often required to perform Terraform actions, and the greater the number principals you have with privileged access, the higher your attack surface area becomes. Therefore, from a security standpoint, we’d like to have fewer sets of admin credentials which can potentially be compromised.

What is Atlantis?

And what is Atlantis? Atlantis is an open source tool that allows safe collaboration on Terraform projects by making sure that proposed changes are reviewed and that the proposed change is the actual change which will be executed on your infrastructure. Atlantis is compatible (at the time of writing) with GitHub and Gitlab, so if you’re not using either of these Git hosting systems, you won’t be able to use Atlantis.

How Atlantis Works With Terraform

Atlantis is deployed as a single binary executable with no system-wide dependencies. An operator adds a GitHub or GitLab token for a repository containing Terraform code. The Atlantis installation process then adds hooks to the repository which allows communication to the Atlantis server during the Pull Request process.

You can run Atlantis in a container or a small virtual machine – the only requirement is that the Terraform instance can communicate with both your version control (e.g. GitHub) and infrastructure (e.g. AWS) you’re changing. Once Atlantis is configured for a repository, the typical workflow is:

  1. A developer creates a feature branch in git, makes some changes, and creates a Pull Request (GitHub) or Merge Request (GitLab).
  2. The developer enters atlantis plan in a PR comment.
  3. Via the installed web hooks, Atlantis locally runs terraform plan. If there are no other Pull Requests in progress, Atlantis adds the resulting plan as a comment to the Merge Request.
    • If there are other Pull Requests in progress, the command fails because we can’t ensure that the plan will be valid once applied.
  4. The developer ensures the plan looks good and adds reviewers to the Merge Request.
  5. Once the PR has been approved, the developer enters atlantis apply in a PR comment. This will trigger Atlantis to run terraform apply and the changes will be deployed to your infrastructure.
    • The command will fail if the Merge Request has not been approved.

The following sequence diagram illustrates the sequence of actions described above:

Atlantis sequence diagram

We can see how our pain points in Terraform collaboration are addressed by Atlantis:

  1. In order to enforce code review, you can launch Atlantis with the –require approvals flag: https://github.com/runatlantis/atlantis#approvals
  2. In order to ensure that your terraform plan accurately reflects the change to your infrastructure that will be made when you run terraform apply, Atlantis performs locking on a project or workspace basis: https://github.com/runatlantis/atlantis#locking
  3. In order to prevent creating multiple sets of privileged credentials, you can deploy Atlantis to run on an EC2 instance with a privileged IAM role in its instance profile (e.g. in AWS). In this way, all of your Terraform commands run through a single set of privileged credentials and obviate the need to distribute multiple sets of privileged credentials: https://github.com/runatlantis/atlantis#aws-credentials

Conclusion

You can see that with minimal additional infrastructure you can establish a safe and reliable CI/CD pipeline for your infrastructure as code, enabling you to get more done safely! To find out how you can deploy a CI/CD pipeline in less than 60 days, download our datasheet.

-Josh Kodroff, Associate Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss

Utilizing Amazon Systems Manager to ensure systems are securely configured and maintained

Compliance is a constant challenge today. Keeping our system images in a healthy and trusted state of compliance requires time and effort. There are millions of tools and technologies in market to help customers maintain compliance and state, so where do I start?

Amazon has built a rich set of core technologies within the Amazon Web Services console. Systems Manager is a fantastic operations management platform tool that can assist you with setting up and maintaining configuration and state management.

One of the first things we must focus on when we build out our core images in the cloud is the configuration of those images. What is the role of the image, what operating system am I going to utilize and what applications and/or core services do I need to enable, configure and maintain?  In the datacenter, we call these Gold Images. The same applies in the cloud.

We define these roles for our images and we place them in different functional areas – Infrastructure, Web Services, Applications. We may have many core image templates for our enterprise workloads, by building these base images and maintaining them continuously – we set in motion a solid foundation for core security and core compliance of our cloud environment.

Amazon Systems Manager looks across my cloud environment and allows me to bring together all the key information around all my operating resources in the cloud. It allows me to centralize the gathering of all core baseline information for my resources in one place. In the past I would have had to look at my AWS CloudWatch information in one area, my AWS CloudTrail information in another area and my configuration information in yet another area. Centralizing this information in one console allows you to see the holistic state of your cloud environment baselines in one console.

AWS Systems Manager provides built-in Insight and Dashboards that allow you to look across your entire cloud environment and see into and act upon your cloud resources. AWS Systems Manager allows you to see the configuration compliance of all your resources as well as the state management and associations across your resources. It provides a rich ability to customize configuration and state management for your workloads, applications and resource types and scan and analyze to ensure those configuration and states are maintained continuously. With AWS Systems Manager you can customize and create your own compliance types to marry to your Enterprise Organizational baseline of your company’s business requirements. With that in place, I can constantly scan and analyze against these compliance baselines to ensure and maintain the operational configuration and state always.

We analyze and report on the current state and quickly determine compliance or out of compliance state centrally for our cloud services and resources. We can create base reports around our compliance position at any time, and with this knowledge, we can set in motion remediation to return our services and resources back to a compliant state and configuration.

With Amazon Systems Manager we can scan all resources for patch state, determine what patches are missing and manually, scheduled or automate the remediation of those patches to maintain patch management compliance.

Amazon Systems Manager also integrates with Chef InSpec, allowing you to leverage Chef InSpec to operate in a continuous compliance framework for your cloud resources.

On the road to compliance it is important to flex the tools and capabilities of your Cloud Provider. Amazon gives us a rich set of Systems Management capabilities across configuration, state management, patch management and remediation, as well as reporting. Amazon Systems Manager is provided at no cost to Amazon customers and will help you along your Journey to realizing continuous compliance of your cloud environment across the Amazon Cloud and the Hybrid Cloud. To learn more about using Amazon Systems Manager or your systems’ compliance, contact us.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss

Creating a simple Alexa Skill

Why do it?

Alexa gets a lot of use in our house, and it is very apparent to me that the future is not a touch screen or a mouse, but voice.  Creating an Alexa skill is easy to learn by watching videos and such, but actually creating the skill is a great way to understand the ins and outs of the process and what the backend systems (like AWS Lambda) are capable of.

First you need a problem

To get started, you need a problem to solve.  Once you have the problem, you’ll need to think about the solution before you write a line of code.  What will your skill do?  You need to define the requirements.  For my skill, I wanted to ask Alexa to “park my cloud” and have her stop all EC2 instances or RDS databases in my environment.

Building a solution one word at a time

Now that I’ve defined the problem and have an idea for the requirements of the solution, it’s time to start building the skill.  The first thing you’ll notice is that the Alexa Skill port is not in the standard AWS portal.  You need to go to developer.amazon.com/Alexa and create a developer account and sign in there.  Once inside, there is a lot of good information and videos on creating Alexa skills that are worth reviewing.  Click the “Create Skill” button to get started.  In my example, I’m building a custom skill.

Build

The process for building a skill is broken into major sections; Build, Test, Launch, Measure.  In each one you’ll have a number of things to complete before moving on to the next section.  The major areas of each section are broken down on the left-hand side of the console.  On the initial dashboard you’re also presented with the “Skill builder checklist” on the right as a visual reminder of what you need to do before moving on.

Interaction model

This is the first area you’ll work on in the Build phase of your Alexa skill.  This is setting up how your users will interact with your skill.

Invocation

Invocation will setup how your users will launch your skill.  For simplicity’s sake, this is often just the name of the skill.  The common patterns will be “Alexa, ask [my skill] [some request],” or “Alexa, launch [my skill].”  You’ll want to make sure the invocation for your skill sounds natural to a native speaker.

Intents

I think of intents as the “functions” or “methods” for my Alexa skill.  There are a number of built-in intents that should always be included (Cancel, Help, Stop) as well as your custom intents that will compose the main functionality of your skill.  Here my intent is called “park” since that will have the logic for parking my AWS systems.  The name here will only be exposed to your own code, so it isn’t necessarily important what it is.

Utterances

Utterances is your defined pattern of how people will use your skill.  You’ll want to focus on natural language and normal patterns of speech for native users in your target audience.  I would recommend doing some research and speaking to a diversity of people to get a good cross section of utterances for your skill.  More is better.

Slots

Amazon also provides the option to use slots (variables) in your utterances.  This allows your skill to do things that are dynamic in nature.  When you create a variable in an utterance you also need to create a slot and give it a slot type.  This is like providing a type to a variable in a programming language (Number, String, etc.) and will allow Amazon to understand what to expect when hearing the utterance.  In our simple example, we don’t need any slots.

Interfaces

Interfaces allow you to interface your skill with other services to provide audio, display, or video options.  These aren’t needed for a simple skill, so you can skip it.

Endpoint

Here’s where you’ll connect your Alexa skill to the endpoint you want to handle the logic for your skill.  The easiest setup is to use AWS Lambda.  There are lots of example Lambda blueprints using different programming languages and doing different things.  Use those to get started because the json response formatting can be difficult otherwise.  If you don’t have an Alexa skill id here, you’ll need to Save and Build your skill first.  Then a skill id will be generated, and you can use it when configuring your Lambda triggers.

AWS Account Lambda

Assuming you already have an AWS account, you’ll want to deploy a new Lambda from a blueprint that looks somewhat similar to what you’re trying to accomplish with your skill (deployed in US-East-1).  Even if nothing matches well, pick any one of them as they have the json return formatting set up so you can use it in your code.  This will save you a lot of time and effort.  Take a look at the information here and here for more information about how to setup and deploy Lambda for Alexa skills.  You’ll want to configure your Alexa skill as the trigger for the Lambda in the configuration, and here’s where you’ll copy in your skill id from the developer console “Endpoints” area of the Build phase.

While the actual coding of the Lambda isn’t the purpose of the article, I will include a couple of highlights that are worth mentioning.  Below, see the part of the code from the AWS template that would block the Lambda from being run by any Alexa skill other than my own.  While the chances of this are rare, there’s no reason for my Lambda to be open to everyone.  Here’s what that code looks like in Python:

if (event[‘session’][‘application’][‘applicationId’] != “amzn1.ask.skill.000000000000000000000000”):

raise ValueError(“Invalid Application ID”)

Quite simply, if the Alexa application id passed in the session doesn’t match my known Alexa skill id, then raise an error.  The other piece of advice I’d give about the Lambda is to create different methods for each intent to keep the logic separated and easy to follow.  Make sure you remove any response language from your code that is from the original blueprint.  If your responses are inconsistent, Amazon will fail your skill (I had this happen multiple times because I borrowed from the “Color Picker” Lambda blueprint and had some generic responses left in the code).  Also, you’ll want to handle your Cancel, Help, and Stop requests correctly.  Lastly, as best practice in all code, add copious logging to CloudWatch so you can diagnose issues.  Note the ARN of your Lambda function as you’ll need it for configuring the endpoints in the developer portal.

Test

Once your Lambda is deployed in AWS, you can go back into the developer portal and begin testing the skill.  First, put your Lambda function ARN into the endpoint configuration for your skill.  Next, click over to the Test phase at the top and choose “Alexa Simulator.”  You can try recording your voice on your computer microphone or typing in the request.  I recommend you do both to get a sense of how Alexa will interpret what you say and respond.  Note that I’ve found the actual Alexa is better at natural language processing than the test options using a microphone on my laptop.  When you do a test, the console will show you the JSON input and output.  You can take this INPUT pane and copy that information to build a Lambda test script on your Lambda function.  If you need to do a lot of work on your Lambda, it’s a lot easier to test from there than to flip back and forth.  Pay special attention to your utterances.  You’ll learn quickly that your proposed utterances weren’t as natural as you thought.  Make updates to the utterances and Lambda as needed and keep testing.

Launch

Once you have your skill in a place where it’s viable, it’s time to launch.  Under “Skill Preview,” you’ll pick names, icons, and a category for your skill.  You’ll also create “cards” that people will see in the Alexa app that explain how to use your skill and the kinds of things you can say.  Lastly, you’ll need a URL to your “Privacy Policy” and “Terms of Use.”  Under “Privacy and Compliance,” you answer more questions about what kind of information your skill collects, whether it targets children or contains advertising, if it’s ok to export, etc.  Note that if you categorize your skill as a “Game,” it will automatically be considered to be directed at children, so you may want to avoid that with your first skill if you collect any identifiable information. Under “Availability,” you’ll decide if the skill should be public and what countries you want to be able to access the skill.  The last section is “Submission,” where you hopefully get the green checkmark and be allowed to submit.

Skill Certification

Now you wait.  Amazon seems to have a number of automated processes that catch glaring issues, but you will likely end up with some back and forth between yourself and an Amazon employee regarding some part of your skill that needs to be updated.  It took about a week to get my final approval and my skill posted.

Conclusion

Creating your own simple Alexa skill is a fun and easy way to get some experience creating applications that respond to voice and understand what’s possible on the platform.  Good luck!

-Coin Graham, Senior Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss