1-888-317-7920 info@2ndwatch.com

Continuous Compliance – Automatically Detect and Report Vulnerabilities in your Cloud Enterprise

Customers are wrangling with many challenges in managing security at scale across the enterprise. As customers embrace more and more cloud capabilities across more providers, it becomes daunting to manage compliance.

The landscape of tools and providers is endless, and customers are utilizing a mix of traditional enterprise tools from the past along with cloud tools to try to achieve security baselines within their enterprise.

At 2nd Watch we have a strong partnership with Palo Alto Networks, which provides truly enterprise-grade security to our customers across a very diverse enterprise landscape – datacenter, private cloud, public cloud and hybrid – across AWS, Azure and Google Cloud Platform.

Palo Alto Networks acquired a brilliant company recently – Evident.io. Evident.io is well known for providing monitoring, compliance and security posture management to organizations across the globe. Evident.io provides continuous compliance across AWS and Azure and brings strong compliance vehicles around HIPAA, ISO 27001, NIST 800-53, NIST 900-171, PCI and SOC 2.

The key to continuous compliance lies in the ability to centralize monitoring and reporting as well as insight into one console dashboard where you can see, in real time, the core health and state of your cloud enterprise.

This starts with gaining core knowledge of your environment’s current health state. You must audit, assess and report on where you currently stand in terms of scope of health. Knowing current state will allow you to see the areas where you need to correct and will also open insight into compliance challenges. Evident.io automates this process and allows for automated, continuous visibility and control of infrastructure security while allowing for customized workflow and orchestration, which allows clients to tune the solution to fit specific organizational needs and requirements easily and effectively.

After achieving the core insight of current state of compliance, you must now work on ways to remediate and efficiently maintain compliance moving forward. Evident.io provides a rich set of real-time alerting and workflow functionality that allows clients to achieve automated alerting, automated remediation and automated enforcement. Evident.io employs continuous security monitoring and stores the data collected in the evident security platform, which allows our clients to eliminate manual review and build rich reporting and insight into current state and future state. Evident.io employs a rich set of reporting capabilities out of the box, across a broad range of compliance areas, which helps to report compliance quickly and address existing gaps and reduce and mitigate risk moving forward.

Evident.io works through API on AWS and Azure in a read-only posture. This provides a non-intrusive and effective approach to core system and resource insight without the burden of heavy agent deployment and configuration. Evident Security Platform acquires this data through API securely and analyzes it against core compliance baselines and security best practices to ensure gaps in enterprise security are corrected and risk is reduced.

Continuous Compliance requires continuous delivery. As clients embrace the cloud and the capabilities the cloud providers provide, it becomes more important then ever before that we institute solutions that help us manage against continuous software utilization and delivery. The speed of the cloud requires a new approach for core security and compliance, one that provides automation, orchestration and rich reporting to reduce the overall day-to-day burden of managing towards compliance at scale in your cloud enterprise.

If you are not familiar with Evident.io, check them out at http://evident.io, and reach out to us at 2nd Watch for help realizing your potential for continuous compliance in your organization.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss

Logging and Monitoring in the era of Serverless – Part 1

Figuring out monitoring in a holistic sense is a challenge for many companies still, whether it is with conventional infrastructure or new platforms like serverless or containers.

In most applications there are two aspects of monitoring an application:

  • System Metrics such as errors, invocations, latency, memory and cpu usage
  • Business Analytics such as number of signups, number of emails sent, transactions processed, etc

The former is fairly universal and generally applicable in any stack to a varying degree. This is what I would call the undifferentiated aspect of monitoring an application. The abilities to perform error detection and track performance metrics are absolutely necessary to operate an application.

Everything that is old is new again. I am huge fan of the Twelve-Factor App. If you aren’t familiar, I highly suggest taking a look at it. Drafted in 2011 by developers at Heroku, the Twelve-Factor App is a methodology and set best practices designed to enable applications to be built with portability and resiliency when deployed to the web.

In the Twelve-Factor App manifesto, it is stated that applications should produce “logs as event streams” and leave it up to the execution environment to aggregate them. If we are to gather information from our application, why not make that present in the logs? We can use our event stream (i.e. application log) to create time-series metrics. Time-series metrics are just datapoints that have been sampled and aggregated over time, which enable developers and engineers to track performance. They allow us to make correlations with events at a specific time.

AWS Lambda works almost exactly in this way by default, aggregating its logs via AWS CloudWatch. CloudWatch organizes logs based on function, version, and containers while Lambda adds metadata for each invocation. And it is up to the developer to add application-specific logging to their function. CloudWatch, however, will only get you so far. If we want to track more information than just invocation, latency, or memory utilization, we need to analyze the logs deeper. This is where something like Splunk, Kibana, or other tools come into play.

In order to get to the meat of our application and the value it is delivering we need to ensure that we have additional information (telemetry) going to the logs as well:

e.g. – Timeouts – Configuration Failures – Stack traces – Event objects

Logging out these types of events or information will enable those other tools with rich query languages to create a dashboard with just about anything we want on them.

For instance, let’s say we added the following line of code to our application to track an event that was happening from a specific invocation and pull out additional information about execution:

log.Println(fmt.Sprintf(“-metrics.%s.blob.%s”, environment, method))

In a system that tracks time-series metrics in logs (e.g. SumoLogic), we could build a query like this:

“-metrics.prod.blob.” | parse “-metrics.prod.blob.*” as method | timeslice 5m | count(method) group by  _timeslice, method | transpose row _timeslice column method

This would give us a nice breakdown of the different methods used in a CRUD or RESTful service and can then be visualized in the very same tool.

While visualization is nice, particularly when taking a closer look at a problem, it might not be immediately apparent where there is a problem. For that we need some way to grab the attention of the engineers or developers working on the application. Many of the tools mentioned here support some level of monitoring and alerting.

In the next installment of this series we will talk about increasing visibility into your operations and battling dashboard apathy! Check back next week.

-Lars Cromley, Director, Cloud Advocacy and Innovation

Facebooktwittergoogle_pluslinkedinmailrss

10 Ways Migrating to the Cloud Can Improve Your IT Organization

While at 2nd Watch, I’ve had the opportunity to work with a plethora of CIOs on their journey to the cloud.  Some focused on application-specific migrations, while others focused on building a foundation. Regardless of where they started, their journey began out of a need for greater agility, flexibility, extensibility and standardization.

Moving to the cloud not only provides you with agility, flexibility and extensibility – it actually improves your IT organization. How? In this post I will outline 10 ways migrating to the cloud will improve your IT organization.

  1. CI/CD: IT organizations require speed and agility when responding to development and infrastructure requests. Today’s development processes encourage continuous integration, summarized as continuously releasing code utilizing release automation. Using these processes, an IT organization is able to continually produce minimally viable products – faster.
  2. Organizational Streamlining: In order to implement continuous integration, an organization’s processes must be connected and streamlined – from resource provisioning to coding productivity. Moving to the cloud enables the IT organization to create sustainable processes; processes that track requests for resources, the provisioning of those resources, streamlined communication and facilitates the business unit chargeback in addition to the general benefit of working more efficiently. For example, the provisioning process of one customer took 15 days – from requirement gathering to approval to finally provisioning resources. By working with the 2nd Watch team we were able to automate the entire provisioning process, including several approval gates. The new automated process now deploys the requested systems in minutes compared to days.
  3. Work More Efficiently: Moving to the cloud returns the IT organization’s focus to where it belongs so your team can focus on the jobs they were hired to do. No longer focused on resource provisioning, patching and configuration, they are now working on the core functions of their role, such as aligning new IT service offerings to business needs.
  4. New Capabilities: IT organizations can focus on developing new capabilities and capitalize on new opportunities for the business. More importantly, IT departments can focus on projects that more closely align to business strategy.
  5. An actual Dev/Test: Organizations can now create true Dev/Test environments in the cloud that enables self-service provisioning and de-provisioning of testing servers with significantly lower cost and overhead. Something that was previously expensive, inefficient and hard to maintain on-prem can now be deployed in a way that is easy, flexible and cost efficient.
  6. Dedicated CIO Leadership: Moving to and operating within a cloud-based environment requires strong IT leadership. Now the CIO is more easily able to focus on key strategic initiatives that deliver value to the business. With fewer distractions, this ability to define and drive the overall strategy and planning of the organization, IT policy, capacity planning, compliance and security enables the CIO to lead the charge with innovation when working with business.
  7. Foster Stronger IT and Business Relationships: Moving to the cloud creates stronger relationships between IT and the business. No longer is IT relegated to just determining requirements, selecting services and implementing the chosen solution. They can now participate in collaborative discussions with the business to help define what is Moving to the cloud fosters collaboration between IT and business leaders to promote a cohesive and inclusive cloud strategy that meets IT’s governance requirements but also enables the agility needed by the business to stay competitive.
  8. Creation of a CCoE: Migrating to the cloud offers the IT organization an opportunity to create a Cloud Center of Excellence. Ideally, the CCoE should be designed to be a custom turn-key operation embedded with your enterprise’s existing IT engineers as part of its core level of expertise. This team will consist of an IT team dedicated to creating, evangelizing, and institutionalizing best practices, frameworks, and governance for evolving technology operations, which are increasingly implemented using the cloud. The CCoE develops a point of view for how cloud technology is implemented at scale for an organization. Moreover, by creating a CCoE it can help with breaking down silos and creating a single pane of glass view when it comes to cloud technology, from creating a standard for machine images through infrastructure builds to managing cloud costs.
  9. New Training Opportunities: Evolving the technical breadth already present in the organization and working through the cultural changes required to bring the skeptics along is a great opportunity to bring your team closer together while simultaneously expanding your capabilities. The more knowledge your teams have on cloud technologies, the smoother the transition will be for the organization. As a result, you will develop more internal evangelists and ease the fear, uncertainty and doubt often felt by IT professionals when making the transition to the cloud. The importance of investing in training and growth of employees cannot be stressed enough as, based on our experience, there is a strong correlation between investments in training and successful moves to the cloud. Continued education is part of the “Cloud Way” that pays off while preserving much of the tribal knowledge that exists within your organization.
  10. Flexibility, Elasticity and Functionality: Cloud computing allows your IT organization to adapt more quickly with flexibility that is not available when working with on-prem solutions. Moving to a cloud platform enables quick response to internal capacity demands. No more over-provisioning! With cloud computing, you can pay as you go – spin up what you need when you need it, and spin it down when demand drops.

As a whole, IT organizations need to be prepared to set aside the old and welcome new approaches to delivering cloud services. The journey to the cloud not only brings efficiencies but also fosters more collaboration within your organization and enhances your IT organization to becoming a well-oiled machine that develops best practices and quickly responds to your business cloud needs. Ready to get started on your cloud journey? Contact us to get started with a Cloud Readiness Assessment.

-Yvette Schmitter, Senior Manager

 

Facebooktwittergoogle_pluslinkedinmailrss

2nd Watch Earns AWS Certification Distinction for Achieving 200 Certifications

Today we are excited to announce we have earned an AWS Certification Distinction for achieving more than 200 active AWS Certifications! We have invested significant time and resources in education to validate the technical expertise of our staff with AWS certifications, enabling our cloud experts to better serve our clients across the US. This distinction assures our customers that they are working with a well-qualified AWS partner, since AWS Certifications recognize IT professionals with the technical skills and expertise to design, deploy, and operate applications and infrastructure on AWS.

2nd Watch’s value of achieving AWS Certifications is passed on to our customers, beyond passing an exam, as our technical teams have nearly a decade of hands-on, practical experience surpassing many providers. This designation is more than just an AWS Certification in that it highlights our company’s culture and commitment to providing highly-qualified individuals for every project to provide customers a high-quality and consistent customer experience in every engagement.

“Being an APN Partner provides us with the right resources, accreditation and training we need to serve our customers, and these certifications validate our AWS expertise and customer obsession,” says 2nd Watch co-founder and EVP of Marketing and Business Development, Jeff Aden. “AWS Certifications benefit our team and extend to our customers in a high-quality experience. We’ll continue to invest in our team’s education and training, to ensure we’re at the forefront of AWS’ innovation.”

AWS Certifications are recognized industry-wide as a credential that shows expertise in AWS cloud infrastructure, and has been recognized as one of the top 10 Cloud Certifications for partners. Historically, AWS Certifications have been an individual achievement, but APN Certification Distinctions now showcase APN Partners that have achieved 50 or more AWS Certifications.

-Nicole Maus, Marketing Manager

Facebooktwittergoogle_pluslinkedinmailrss

Fully Coded And Automated CI/CD Pipelines: The Weeds

The Why

In my last post we went over why we’d want to go the CI/CD/Automated route and the more cultural reasons of why it is so beneficial. In this post, we’re going to delve a little bit deeper and examine the technical side of tooling. Remember, a primary point of doing a release is mitigating risk. CI/CD is all about mitigating risk… fast.

There’s a Process

The previous article noted that you can’t do CI/CD without building on a set of steps, and I’m going to take this approach here as well. Unsurprisingly, we’ll follow the steps we laid out in the “Why” article, and tackle each in turn.

Step I: Automated Testing

You must automate your testing. There is no other way to describe this. In this particular step however, we can concentrate on unit testing: Testing the small chunks of code you produce (usually functions or methods). There’s some chatter about TDD (Test Driven Development) vs BDD (Behavior Driven Development) in the development community, but I don’t think it really matters, just so long as you are writing test code along side your production code. On our team, we prefer the BDD style testing paradigm. I’ve always liked the symantically descriptive nature of BDD testing over strictly code-driven ones. However, it should be said that both are effective and any is better than none, so this is more of a personal preference. On our team we’ve been coding in golang, and our BDD framework of choice is the Ginkgo/Gomega combo.

Here’s a snippet of one of our tests that’s not entirely simple:

Describe("IsValidFormat", func() {
  for _, check := range AvailableFormats {
    Context("when checking "+check, func() {
      It("should return true", func() {
        Ω(IsValidFormat(check)).To(BeTrue())
      })
    })
  }
 
  Context("when checking foo", func() {
    It("should return false", func() {
      Ω(IsValidFormat("foo")).To(BeFalse())
    })
  })
)

So as you can see, the Ginkgo (ie: BDD) formatting is pretty descriptive about what’s happening. I can instantly understand what’s expected. The function IsValidFormat, should return true given the range (list) of AvailableFormats. A format of foo (which is not a valid format) should return false. It’s both tested and understandable to the future change agent (me or someone else).

Step II: Continuous Integration

Continuous Integration takes Step 1 further, in that it brings all the changes to your codebase to a singular point, and building an object for deployment. This means you’ll need an external system to automatically handle merges / pushes. We use Jenkins as our automation server, running it in Kubernetes using the Pipeline style of job description. I’ll get into the way we do our builds using Make in a bit, but the fact we can include our build code in with our projects is a huge win.

Here’s a (modified) Jenkinsfile we use for one of our CI jobs:

def notifyFailed() {
  slackSend (color: '#FF0000', message: "FAILED: '${env.JOB_NAME} [${env.BUILD_NUMBER}]' (${env.BUILD_URL})")
}
 
podTemplate(
  label: 'fooProject-build',
  containers: [
    containerTemplate(
      name: 'jnlp',
      image: 'some.link.to.a.container:latest',
      args: '${computer.jnlpmac} ${computer.name}',
      alwaysPullImage: true,
    ),
    containerTemplate(
      name: 'image-builder',
      image: 'some.link.to.another.container:latest',
      ttyEnabled: true,
      alwaysPullImage: true,
      command: 'cat'
    ),
  ],
  volumes: [
    hostPathVolume(
      hostPath: '/var/run/docker.sock',
      mountPath: '/var/run/docker.sock'
    ),
    hostPathVolume(
      hostPath: '/home/jenkins/workspace/fooProject',
      mountPath: '/home/jenkins/workspace/fooProject'
    ),
    secretVolume(
      secretName: 'jenkins-creds-for-aws',
      mountPath: '/home/jenkins/.aws-jenkins'
    ),
    hostPathVolume(
      hostPath: '/home/jenkins/.aws',
      mountPath: '/home/jenkins/.aws'
    )
  ]
)
{
  node ('fooProject-build') {
    try {
      checkout scm
 
      wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
        container('image-builder'){
          stage('Prep') {
            sh '''
              cp /home/jenkins/.aws-jenkins/config /home/jenkins/.aws/.
              cp /home/jenkins/.aws-jenkins/credentials /home/jenkins/.aws/.
              make get_images
            '''
          }
 
          stage('Unit Test'){
            sh '''
              make test
              make profile
            '''
          }
 
          step([
            $class:              'CoberturaPublisher',
            autoUpdateHealth:    false,
            autoUpdateStability: false,
            coberturaReportFile: 'report.xml',
            failUnhealthy:       false,
            failUnstable:        false,
            maxNumberOfBuilds:   0,
            sourceEncoding:      'ASCII',
            zoomCoverageChart:   false
          ])
 
          stage('Build and Push Container'){
            sh '''
              make push
            '''
          }
        }
      }
 
      stage('Integration'){
        container('image-builder') {
          sh '''
            make deploy_integration
            make toggle_integration_service
          '''
        }
        try {
          wrap([$class: 'AnsiColorBuildWrapper', 'colorMapName': 'XTerm']) {
            container('image-builder') {
              sh '''
                sleep 45
                export KUBE_INTEGRATION=https://fooProject-integration
                export SKIP_TEST_SERVER=true
                make integration
              '''
            }
          }
        } catch(e) {
          container('image-builder'){
            sh '''
              make clean
            '''
          }
          throw(e)
        }
      }
 
      stage('Deploy to Production'){
        container('image-builder') {
          sh '''
            make clean
            make deploy_dev
          '''
        }
      }
    } catch(e) {
      container('image-builder'){
        sh '''
          make clean
        '''
      }
      currentBuild.result = 'FAILED'
      notifyFailed()
      throw(e)
    }
  }
}

There’s a lot going on here, but the important part to notice is that I grabbed this from the project repo. The build instructions are included with the project itself. It’s creating an artifact, running our tests, etc. But it’s all part of our project code base. It’s checked into git. It’s code like all the other code we mess with. The steps are somewhat inconsequential for this level of topic, but it works. We also have it setup to run when there’s a push to github (AND nightly). This ensures that we are continuously running this build and integrating everything that’s happened to the repo in a day. It helps us keep on top of all the possible changes to the repo as well as our environment.

Hey… what’s all that make_ crap?_

Make

Our team uses a lot of tools. We ascribe to the maxim: Use what’s best for the particular situation. I can’t remember every tool we use. Neither can my teammates. Neither can 90% of the people that “do the devops.” I’ve heard a lot of folks say, “No! We must solidify on our toolset!” Let your teams use what they need to get the job done the right way. Now, the fear of experiencing tool “overload” seems like a legitimate one in this scenario, but the problem isn’t the number of tools… it’s how you manage and use use them.

Enter Makefiles! (aka: make)

Make has been a mainstay in the UNIX world for a long time (especially in the C world). It is a build tool that’s utilized to help satisfy dependencies, create system-specific configurations, and compile code from various sources independent of platform. This is fantastic, except, we couldn’t care less about that in the context of our CI/CD Pipelines. We use it because it’s great at running “buildy” commands.

Make is our unifier. It links our Jenkins CI/CD build functionality with our Dev functionality. Specifically, opening up the docker port here in the Jenkinsfile:

volumes: [
  hostPathVolume(
    hostPath: '/var/run/docker.sock',
    mountPath: '/var/run/docker.sock'
  ),

…allows us to run THE SAME COMMANDS WHEN WE’RE DEVELOPING AS WE DO IN OUR CI/CD PROCESS. This socket allows us to run containers from containers, and since Jenkins is running on a container, this allows us to run our toolset containers in Jenkins, using the same commands we’d use in our local dev environment. On our local dev machines, we use docker nearly exclusively as a wrapper to our tools. This ensures we have library, version, and platform consistency on all of our dev environments as well as our build system. We use containers for our prod microservices so production is part of that “chain of consistency” as well. It ensures that we see consistent behavior across the horizon of application development through production. It’s a beautiful thing! We use the Makefile as the means to consistently interface with the docker “tool” across differing environments.

Ok, I know your interest is peaked at this point. (Or at least I really hope it is!)
So here’s a generic makefile we use for many of our projects:

CONTAINER=$(shell basename $$PWD | sed -E 's/^ia-image-//')
.PHONY: install install_exe install_test_exe deploy test
 
install:
    docker pull sweet.path.to.a.repo/$(CONTAINER)
    docker tag sweet.path.to.a.repo/$(CONTAINER):latest $(CONTAINER):latest
 
install_exe:
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER) \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
 
install_test_exe:
    if [[ ! -d $(HOME)/bin ]]; then mkdir -p $(HOME)/bin; fi
    echo "docker run -itP -v \$$PWD:/root $(CONTAINER)-test \"\$$@\"" > $(HOME)/bin/$(CONTAINER)
    chmod u+x $(HOME)/bin/$(CONTAINER)
 
test:
    docker build -t $(CONTAINER)-test .
 
deploy:
    captain push

This is a Makefile we use to build our tooling images. It’s much simpler than our project Makefiles, but I think this illustrates how you can use Make to wrap EVERYTHING you use in your development workflow. This also allows us to settle on similar/consistent terminology between different projects. %> make test? That’ll run the tests regardless if we are working on a golang project or a python lambda project, or in this case, building a test container, and tagging it as whatever-test. Make unifies “all the things.”

This also codifies how to execute the commands. ie: what arguments to pass, what inputs etc. If I can’t even remember the name of the command, I’m not going to remember the arguments. To remedy, I just open up the Makefile, and I can instantly see.

Step III: Continuous Deployment

After the last post (you read it right?), some might have noticed that I skipped the “Delivery” portion of the “CD” pipeline. As far as I’m concerned, there is no “Delivery” in a “Deployment” pipeline. The “Delivery” is the actual deployment of your artifact. Since the ultimate goal should be Depoloyment, I’ve just skipped over that intermediate step.

Okay, sure, if you want to hold off on deploying automatically to Prod, then have that gate. But Dev, Int, QA, etc? Deployment to those non-prod environments should be automated just like the rest of your code.

If you guessed we use make to deploy our code, you’d be right! We put all our deployment code with the project itself, just like the rest of the code concerning that particular object. For services, we use a Dockerfile that describes the service container and several yaml files (e.g. deployment_<env>.yaml) that describe the configurations (e.g. ingress, services, deployments) we use to configure and deploy to our Kubernetes cluster.

Here’s an example:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: sweet-aws-service
    stage: dev
  name: sweet-aws-service-dev
  namespace: sweet-service-namespace
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: sweet-aws-service
      name: sweet-aws-service
    spec:
      containers:
      - name: sweet-aws-service
        image: path.to.repo.for/sweet-aws-service:latest
        imagePullPolicy: Always
        env:
          - name: PORT
            value: "50000"
          - name: TLS_KEY
            valueFrom:
              secretKeyRef:
                name: grpc-tls
                key: key
          - name: TLS_CERT
            valueFrom:
              secretKeyRef:
                name: grpc-tls
                key: cert

This is an example of a deployment into Kubernetes for dev. That %> make deploy_dev from the Jenkinsfile above? That’s pushing this to our Kubernetes cluster.

Conclusion

There is a lot of information to take in here, but there are two points to really take home:

  1. It is totally possible.
  2. Use a unifying tool to… unify your tools. (“one tool to rule them all”)

For us, Point 1 is moot… it’s what we do. For Point 2, we use Make, and we use Make THROUGH THE ENTIRE PROCESS. I use Make locally in dev and on our build server. It ensures we’re using the same commands, the same containers, the same tools to do the same things. Test, integrate (test), and deploy. It’s not just about writing functional code anymore. It’s about writing a functional process to get that code, that value, to your customers!

And remember, as with anything, this stuff get’s easier with practice. So once you start doing it you will get the hang of it and life becomes easier and better. If you’d like some help getting started, download our datasheet to learn about our Modern CI/CD Pipeline.

-Craig Monson, Sr Automation Architect

 

Facebooktwittergoogle_pluslinkedinmailrss

How We Organize Terraform Code at 2nd Watch

When IT organizations adopt infrastructure as code (IaC), the benefits in productivity, quality, and ability to function at scale are manifold. However, the first few steps on the journey to full automation and immutable infrastructure bliss can be a major disruption to a more traditional IT operations team’s established ways of working. One of the common problems faced in adopting infrastructure as code is how to structure the files within a repository in a consistent, intuitive, and scaleable manner. Even IT operations teams whose members have development skills will still face this anxiety-inducing challenge simply because adopting IaC involves new tools whose conventions differ somewhat from more familiar languages and frameworks.

In this blog post, we’ll go over how we structure our IaC repositories within 2nd Watch professional services and managed services engagements with a particular focus on Terraform, an open-source tool by Hashicorp for provisioning infrastructure across multiple cloud providers with a single interface.

First Things First: README.md and .gitignore

The task in any new repository is to create a README file. Many git repositories (especially on Github) have adopted Markdown as a de facto standard format for README files. A good README file will include the following information:

  1. Overview: A brief description of the infrastructure the repo builds. A high-level diagram is often an effective method of expressing this information. 2nd Watch uses LucidChart for general diagrams (exported to PNG or a similar format) and mscgen_js for sequence diagrams.
  2. Pre-requisites: Installation instructions (or links thereto) for any software that must be installed before building or changing the code.
  3. Building The Code: What commands to run in order to build the infrastructure and/or run the tests when applicable. 2nd Watch uses Make in order to provide a single tool with a consistent interface to build all codebases, regardless of language or toolset. If using Make in Windows environments, Windows Subsystem for Linux is recommended for Windows 10 in order to avoid having to write two sets of commands in Makefiles: Bash, and PowerShell.

It’s important that you do not neglect this basic documentation for two reasons (even if you think you’re the only one who will work on the codebase):

  1. The obvious: Writing this critical information down in an easily viewable place makes it easier for other members of your organization to onboard onto your project and will prevent the need for a panicked knowledge transfer when projects change hands.
  2. The not-so-obvious: The act of writing a description of the design clarifies your intent to yourself and will result in a cleaner design and a more coherent repository.

All repositories should also include a .gitignore file with the appropriate settings for Terraform. GitHub’s default Terraform .gitignore is a decent starting point, but in most cases you will not want to ignore .tfvars files because they often contain environment-specific parameters that allow for greater code reuse as we will see later.

Terraform Roots and Multiple Environments

A Terraform root is the unit of work for a single terraform apply command. We group our infrastructure into multiple terraform roots in order to limit our “blast radius” (the amount of damage a single errant terraform apply can cause).

  • Repositories with multiple roots should contain a roots/ directory with a subdirectory for each root (e.g. VPC, one per-application) tf file as the primary entry point.
  • Note that the roots/ directory is optional for repositories that only contain a single root, e.g. infrastructure for an application team which includes only a few resources which should be deployed in concert. In this case, modules/ may be placed in the same directory as tf.
  • Roots which are deployed into multiple environments should include an env/ subdirectory at the same level as tf. Each environment corresponds to a tfvars file under env/ named after the environment, e.g. staging.tfvars. Each .tfvars file contains parameters appropriate for each environment, e.g. EC2 instance sizes.

Here’s what our roots directory might look like for a sample with a VPC and 2 application stacks, and 3 environments (QA, Staging, and Production):

Terraform modules

Terraform modules are self-contained packages of Terraform configurations that are managed as a group. Modules are used to create reusable components, improve organization, and to treat pieces of infrastructure as a black box. In short, they are the Terraform equivalent of functions or reusable code libraries.

Terraform modules come in two flavors:

  1. Internal modules, whose source code is consumed by roots that live in the same repository as the module.
  2. External modules, whose source code is consumed by roots in multiple repositories. The source code for external modules lives in its own repository, separate from any consumers and separate from other modules to ensure we can version the module correctly.

In this post, we’ll only be covering internal modules.

  • Each internal module should be placed within a subdirectory under modules/.
  • Module subdirectories/repositories should follow the standard module structure per the Terraform docs.
  • External modules should always be pinned at a version: a git revision or a version number. This practice allows for reliable and repeatable builds. Failing to pin module versions may cause a module to be updated between builds by breaking the build without any obvious changes in our code. Even worse, failing to pin our module versions might cause a plan to be generated with changes we did not anticipate.

Here’s what our modules directory might look like:

Terraform and Other Tools

Terraform is often used alongside other automation tools within the same repository. Some frequent collaborators include Ansible for configuration management and Packer for compiling identical machine images across multiple virtualization platforms or cloud providers. When using Terraform in conjunction with other tools within the same repo, 2nd Watch creates a directory per tool from the root of the repo:

Putting it all together

The following illustrates a sample Terraform repository structure with all of the concepts outlined above:

Conclusion

There’s no single repository format that’s optimal, but we’ve found that this standard works for the majority of our use cases in our extensive use of Terraform on dozens of projects. That said, if you find a tweak that works better for your organization – go for it! The structure described in this post will give you a solid and battle-tested starting point to keep your Terraform code organized so your team can stay productive.

Additional resources

  • The Terraform Book by James Turnbull provides an excellent introduction to Terraform all the way through repository structure and collaboration techniques.
  • The Hashicorp AWS VPC Module is one of the most popular modules in the Terraform Registry and is an excellent example of a well-written Terraform module.
  • The source code for James Nugent’s Hashidays NYC 2017 talk code is an exemplary Terraform repository. Although it’s based on an older version of Terraform (before providers were broken out from the main Terraform executable), the code structure, formatting, and use of Makefiles is still current.

For help getting started adopting Infrastructure as Code, contact us.

-Josh Kodroff, Associate Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss

Assessing your cloud environment and maintaining a baseline security posture

The cloud is an exciting place, where the realization of instant implementation and execution of enterprise applications and workloads at scale is achieved. As we move to utilize the cloud we tend to utilize already existing applications and operating systems that are baked into the cloud service provider’s frame. Baseline images and cloud virtual machines are readily available, and we quickly utilize these images to put our infrastructure in motion in the cloud. This presents some challenges in terms of security, as we tend to deploy first and secure second, which is not ideal from a proactive security stance.

Continuous deployment is becoming the normality. We are literally updating and changing our systems many times a day now in the cloud. New features become available and new ISV applications and services roll out on a continuous basis, and we leverage them rapidly.

To maintain a healthy cloud state for our infrastructure and applications, we need to focus our attention on managing towards a baseline configuration state and desired state configuration for our systems. We need to monitor, maintain and continuously be aware of our systems’ state of health and state of configuration at any given time.

Clients are evolving so quickly in the cloud that they end up  playing catchup on the security side of their cloud environment.  One way to support a proactive level of security for your cloud environment is to perform continuous vulnerability, assessment and remediation activities on a regular basis to ensure all areas of your cloud environment meet a baseline of security.

The cloud service providers are doing a lot of heavy lifting to support you to achieve a proactive security architecture, and they provide many different layers of tools and technologies to help you achieve this.

The first area to focus on is your machine library. Maintaining a secure baseline image scope across the catalog of your infrastructure and applications will ensure you can track and maintain those image states per role of operation. This helps to ensure that when the cataloged healthy state image takes an un-approved change, you become instantly aware of it and can roll back to a healthy image state, undo any compromises to image integrity and return to a secure and compliant operating state.

AWS has a great tool that helps us maintain configuration state management. AWS Config allows you to audit, evaluate and assess configuration baselines of your cloud resources continuously. It provides strong monitoring of your AWS resource configurations and support evaluating the state against your secure baseline and ensuring your desired state is maintain and recovered. It gives you the ability to quickly review changes to configuration state and how that state changes across the broad set of resources that make up your infrastructure or application.

It provides deep insight and details around resources, resource historical records and configuration and supports strong capabilities around compliance, specifically for your organization and specified resource guidelines.

Utilizing tools such as AWS Config provides you a strong first step in the path to achieving continuous security baseline protection and managing against vulnerabilities and unwarranted changes to your resources in production. Outside of AWS Config, AWS provides a strong suite of tools and technologies to help us achieve a more comprehensive security baseline. Check out Amazon System Manager, which supports a common agent topology across Windows and Linux with the SSM agent to support clients scanning and maintaining resource health and information continuously – not to mention it’s a great overall agent to maintain core IT operations management in AWS. We can maintain strong operational health of our cloud resources in AWS with Amazon CloudWatch.

One recent addition for security is Amazon GuardDuty, which provides intelligent threat detection and continuously monitors to protect your AWS accounts and workloads. Amazon GuardDuty is directly accessible through the amazon console and, with a few clicks, can be fully implemented to help support your workload protection proactively.

If you need assistance in achieving strong continuous security for your AWS cloud environment, 2nd Watch offers services to help our clients realize the security posture within their cloud environment today. We can also help create and customize a vulnerability, assessment and remediation strategy that will have long standing benefits in continuously achieving strong security defense in depth in the cloud.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss

So You Think You Can DevOps?

We recently took a DevOps poll of 1,000 IT professionals to get a pulse for where the industry sits regarding the adoption and completeness of vision around DevOps.  The results were pretty interesting, and overall we are able to deduce that a large majority of the organizations who answered the survey are not truly practicing DevOps.  Part of this may be due to the lack of clarity on what DevOps really is.  I’ll take a second to summarize it as succinctly as possible here.

DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support. This includes, but is not limited to, the culture, tools, organization, and practices required to accomplish this amalgamated methodology of delivering IT services.

credit: https://theagileadmin.com/what-is-devops/

In order to practice DevOps you must be in a DevOps state of mind and embrace its values and mantras unwaveringly.

The first thing that jumped out at me from our survey was the responses to the question “Within your organization, do separate teams manage infrastructure/operations and application development?”  78.2% of respondents answered “Yes” to that question.  Truly practicing DevOps requires that the infrastructure and applications are managed within the context of the same team, so we can deduce that at least 78.2% of the respondents’ companies are not truly practicing DevOps.  Perhaps they are using some infrastructure-as-code tools, some forms of automation, or even have CI/CD pipelines in place, but those things alone do not define DevOps.

Speaking of infrastructure-as-code… Another question, “How is your infrastructure deployed and managed?” had nearly 60% of respondents answering that they were utilizing infrastructure-as-code tools (e.g. Terraform, Configuration Management, Kubernetes) to manage their infrastructure, which is positive, but shows the disconnect between the use of DevOps tools and actually practicing DevOps (as noted in the previous paragraph). On the other hand, just over 38% of respondents indicated that they are managing infrastructure manually (e.g. through the console), which means not only are they not practicing DevOps they aren’t even managing their infrastructure in a way that will ever be compatible with DevOps… yikes.  The good news is that tools like Terraform allow you to import existing manually deployed infrastructure where it can then be managed as code and handled as “immutable infrastructure.”  Manually deploying anything is a DevOps anti-pattern and must be avoided at all costs.

Aside from infrastructure we had several questions around application development and deployment as it pertains to DevOps.  Testing code appears to be an area where a majority of respondents are staying proactive in a way that would be beneficial to a DevOps practice.  The question “What is your approach to writing tests?” had the following breakdown on its answers:

  • We don’t really test:  10.90%
  • We get to it if/when we have time:  15.20%
  • We require some percentage of code to be covered by tests before it is ready for production:  32.10%
  • We require comprehensive unit and integration testing before code is pushed to production:  31.10%
  • Rigid TDD/BDD/ATDD/STDD approach – write tests first & develop code to meet those test requirements:  10.70%

We can see that around 75% of respondents are doing some form of consistent testing, which will go a long way in helping build out a DevOps practice, but a staggering 25% of respondents have little or no testing of code in place today (ouch!).  Another question “How is application code deployed and managed?” shows that around 30% of respondents are using a completely manual process for application deployment and the remaining 70% are using some form of an automated pipeline.  Again, the 70% is a positive sign for those wanting to embrace DevOps, but there is still a massive chunk at 30% who will have to build out automation around testing, building, and deploying code.

Another important factor in managing services the DevOps way is to have all your environments mirror each other.  In response to the question “How well do your environments (e.g. dev, test, prod) mirror one another?” around 28% of respondents indicated that their environments are managed completely independently of each other.  Another 47% indicated that “they share some portion of code but are not managed through identical code bases and processes,” and the remaining 25% are doing it properly by “managed identically using same code & processes employing variables to differentiate environments.”  Lots of room for improvement in this area when organizations decide they are ready to embrace the DevOps way.

Our last question in the survey was “How are you notified when an application/process/system fails?” and I found the answers a bit staggering.  Over 21% of respondents indicated that they are notified of outages by the end user.  It’s pretty surprising to see that large of a percentage utilizing such a reactionary method of service monitoring.  Another 32% responded that “someone in operation is watching a dashboard,” which isn’t as surprising but will definitely be something that needs to be addressed when shifting to a DevOps approach.  Another 23% are using third-party tools like NewRelic and Pingdom to monitor their apps.  Once again, we have that savvy ~25% group who are currently operating in a way that bodes well for DevOps adoption by answering “Monitoring is built into the pipeline, apps and infrastructure. Notifications are sent immediately.”  The twenty-five-percenters are definitely on the right path if they aren’t already practicing DevOps today.

In summary, we have been able to deduce from our survey that, at best, around 25% of the respondents are actually engaging in a DevOps practice today. For more details on the results of our survey, download our infographic.

-Ryan Kennedy, Principal Cloud Automation Architect

Facebooktwittergoogle_pluslinkedinmailrss

Governance, Risk and Compliance – Drive Change Across the Organization

Governance, Risk and Compliance (GRC) is a standard framework that helps to drive organizations towards a common set of goals and principals. The overarching theme is strategically focused on how technology utilization and operations tie directly back to an organization’s business goals and, in many cases, aspirations.

There are many facets to GRC. In the cloud it means the same thing as it did in the datacenter. We need to ensure IT organizes around the business, and we need to make sure risk is minimized and compliance is maintained.

At 2nd Watch we work with clients across all areas of GRC. Clients take various levels of focus in each area, and some areas are more important based on the vertical the client is operating in.

The cloud extends beyond the physical bounds of an organization, and with that institutes new challenges and requires a shared cloud responsibility model. The CSP is responsible for the underlying infrastructure setup and physical maintenance of their cloud infrastructure. We work with our cloud ISV and providers’ tools, technologies and best practices to help maintain strong governance and lower risk while meeting compliance.

The landscape of software, tools and solutions to support governance, risk and compliance is daunting in the cloud marketplace. 2nd Watch focuses on providing a holistic support to our clients around GRC. We believe there are fantastic capabilities directly inside the cloud management portals to help customers along the journey to strong GRC framework and institution.

In Microsoft Azure we can utilize Compliance Manager. Compliance Manager is a workflow-based assessment tool that enables organizations to track, assign and verify regulatory compliance procedures and activities in support of Microsoft Cloud technologies – including Office 365 and Dynamics. It supports ISO 27001, IS0 27018 and NIST and supports regulatory compliance around HIPAA and GDPR.  It is a foundational tool to utilize within Microsoft Azure to help you along the path to achieving strong governance, risk and compliance around Microsoft Cloud technologies.

With Amazon Web Services we have a complete set of core cloud operations management tools to utilize within the AWS console to help us bolster governance and security and reduce risk. Amazon provides resources with a full prescriptive set of compliance quick reference guides, which provide an overview of how to maintain a cloud compliant environment through strong security and controls validation, and insight and monitoring for activity and security assurance.

Amazon has a complete Cloud Compliance Center where clients can tap into an abundant set of resources to help along the way.

Beyond the tools, both Microsoft Azure and AWS provide strategic support with partners around compliance. There are many accelerators and programs that organizations can request from and Amazon and Microsoft to help them achieve and maintain GRC specifically tuned to the cloud.

GRC is unique to each organization. Cloud providers bring a substantial set of resources and technologies, along with great prescriptive guidance and best practices to help and guide you in achieving a strategic GRC framework and set of processes and procedures in your organization.

Take advantage of these built-in capabilities as you start to look at other tools and technologies to complete your holistic approach to governance, risk and compliance, and please reach out to 2nd Watch to find out how we can support you along the way.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss

How to Use AWS IAM with STS for access to AWS resources

With increased focus on security and governance in today’s digital economy, I want to highlight a simple but important use case that demonstrates how to use AWS Identity and Access Management (IAM) with Security Token Service (STS) to give trusted AWS accounts access to resources that you control and manage.

Security Token Service is an extension of IAM and is one of several web services offered by AWS that does not incur any costs to use.  But, unlike IAM, there is no user interface on the AWS console to manage and interact with STS. Rather all interaction is done entirely through one of several extensive SDKs or directly using common HTTP protocol.  I will be using Terraform to create some simple resources in my sandbox account and .NET Core SDK to demonstrate how to interact with STS.

The main purpose and function of STS is to issue temporary security credentials for AWS resources to trusted and authenticated entities.  These credentials operate identically to the long-term keys that typical IAM users have, with a couple of special characteristics:

  • They automatically expire and become unusable after a short and defined period of time elapses
  • They are issued dynamically

These characteristics offer several advantages in terms of application security and development and are useful for cross-account delegation and access.  STS solves two problems for owners of AWS resources:

  • Meets the IAM best-practices requirement to regularly rotate access keys
  • You do not need to distribute access keys to external entities or store them within an application

One common scenario where STS is useful involves sharing resources between AWS accounts.  Let’s say, for example, that your organization captures and processes data in S3, and one of your clients would like to push large amounts of data from resources in their AWS account to an S3 bucket in your account in an automated and secure fashion.

While you could create an IAM user for your client, your corporate data policy requires that you rotate access keys on a regular basis, and this introduces challenges for automated processes.  Additionally, you would like to limit the distribution of access keys to your resources to external entities.  Let’s use STS to solve this!

To get started, let’s create some resources in your AWS cloud.  Do you even Terraform, bro?

Let’s create a new S3 bucket and set the bucket ACL to be private, meaning nobody but the bucket owner (that’s you!) has access.  Remember that bucket names must be unique across all existing buckets, and they should comply with DNS naming conventions.  Here is the Terraform HCL syntax to do this:

Great! We now have a bucket… but for now, only the owner can access it.  This is a good start from a security perspective (i.e. “least permissive” access).

What an empty bucket may look like

Let’s create an IAM role that, once assumed, will allow IAM users with access to this role to have permissions to put objects into our bucket.  Roles are a secure way to grant trusted entities access to your resources.  You can think about roles in terms of a jacket that an IAM user can wear for a short period of time, and while wearing this jacket, the user has privileges that they wouldn’t normally have when they aren’t wearing it.  Kind of like a bright yellow Event Staff windbreaker!

For this role, we will specify that users from our client’s AWS account are the only ones that can wear the jacket. This is done by including the client’s AWS account ID in the Principal statement.  AWS Account IDs are not considered to be secret, so your client can share this with you without compromising their security.  If you don’t have a client but still want to try this stuff out, put your own AWS account ID here instead.


Great, now we have a role that our trusted client can wear.  But, right now our client can’t do anything except wear the jacket.  Let’s give the jacket some special powers, such that anyone wearing it can put objects into our S3 bucket.  We will do this by creating a security policy for this role.  This policy will specify what exactly can be done to S3 buckets that it is attached to. Then we will attach it to the bucket we want our client to use.  Here is the Terraform syntax to accomplish this:

A couple things to note about this snippet – First, we are using Terraform interpolation to inject values from previous terraform statements into a couple of places in the policy – specifically the ARN from the role and bucket we created previously.  Second, we are specifying a condition for the s3 policy – one that requires a specific object ACL for the action s3:PutObject, which is accomplished by including the HTTP request header x-amz-acl to have a value of bucket-owner-full-control with the PUT object request.  By default, objects PUT in S3 are owned by the account that created them, even if it is stored in someone else’s bucket.  For our scenario, this condition will require your client to explicitly grant ownership of objects placed in your bucket to you, otherwise the PUT request will fail.

So, now we have a bucket, a policy in place on our bucket, and a role that assumes that policy.  Now your client needs to get to work writing some code that will allow them to assume the role (wear the jacket) and start putting objects into your bucket.  Your client will need to know a couple of things from you before they get started:

  1. The bucket name and the region it was created in (the example above created a bucket named d4h2123b9-xaccount-bucket in us-west-2)
  2. The ARN for the role (Terraform can output this for you). It will look something like this but will have your actual AWS Account ID: arn:aws:iam::123456789012:role/sts-delegate-role

They will also need to create an IAM User in their account and attach a policy allowing the user to assume roles via STS.  The policy will look similar to this:


Let’s help your client out a bit and provide some C# code snippets for .NET Core 2.0 (available for Windows, macOS and LinuxTo get started, install the .NET SDK for your OS, then fire up a command prompt in a favorite directory and run these commands:

The first command will create a new console app in the subdirectory s3cli.  Then switch context to that directory and import the AWS SDK for .NET Core, and then add packages for SecurityToken and S3 services.
Once you have the libraries in place, fire up your favorite IDE or text editor (I use Visual Studio Code), then open Program.cs and add some code:

This snippet sends a request to STS for temporary credentials using the specified ARN.  Note that the client must provide IAM user credentials to call STS, and that IAM user must have a policy applied that allows it to assume a role from STS.

This next snippet takes the STS credentials, bucket name, and region name, and then uploads the Program.cs file that you’re editing and assigns it a random key/name.  Also note that it explicitly applies the Canned ACL that is required by the sts-delegate-role:

So, to put this all together, run this code block and make the magic happen!  Of course, you will have to define and provide proper variable values for your environment, including  securely storing your credentials.

Try it out from the command prompt:

If all goes well, you will have a copy of Program.cs in the bucket. Not very useful itself, but it illustrates how to accomplish the task.

What a bucket with something in it may look like

Here is a high-level document of what we put together:

Putting it all together

Steps:

  1. Your client uses their IAM user to call AWS STS and requests the role ARN you gave them
  2. STS authenticates the client’s IAM user and verifies the policy for the ARN role, then issues a temporary credential to the client.
  3. The client can use the temporary credentials to access your S3 bucket (they will expire soon), and since they are now wearing the Event Staff jacket, they can successfully PUT stuff in your bucket!

There are many other use-cases for STS. This is just one very simplistic example. However, with this brief introduction to the concepts, you should now have a decent idea of how STS works with IAM roles and policies, and how you can use STS to give access to your AWS resources for trusted entities. For more tips like this, contact us.

-Jonathan Eropkin, Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss