1-888-317-7920 info@2ndwatch.com

So You Think You Can DevOps?

We recently took a DevOps poll of 1,000 IT professionals to get a pulse for where the industry sits regarding the adoption and completeness of vision around DevOps.  The results were pretty interesting, and overall we are able to deduce that a large majority of the organizations who answered the survey are not truly practicing DevOps.  Part of this may be due to the lack of clarity on what DevOps really is.  I’ll take a second to summarize it as succinctly as possible here.

DevOps is the practice of operations and development engineers participating together in the entire service lifecycle, from design through the development process to production support. This includes, but is not limited to, the culture, tools, organization, and practices required to accomplish this amalgamated methodology of delivering IT services.

credit: https://theagileadmin.com/what-is-devops/

In order to practice DevOps you must be in a DevOps state of mind and embrace its values and mantras unwaveringly.

The first thing that jumped out at me from our survey was the responses to the question “Within your organization, do separate teams manage infrastructure/operations and application development?”  78.2% of respondents answered “Yes” to that question.  Truly practicing DevOps requires that the infrastructure and applications are managed within the context of the same team, so we can deduce that at least 78.2% of the respondents’ companies are not truly practicing DevOps.  Perhaps they are using some infrastructure-as-code tools, some forms of automation, or even have CI/CD pipelines in place, but those things alone do not define DevOps.

Speaking of infrastructure-as-code… Another question, “How is your infrastructure deployed and managed?” had nearly 60% of respondents answering that they were utilizing infrastructure-as-code tools (e.g. Terraform, Configuration Management, Kubernetes) to manage their infrastructure, which is positive, but shows the disconnect between the use of DevOps tools and actually practicing DevOps (as noted in the previous paragraph). On the other hand, just over 38% of respondents indicated that they are managing infrastructure manually (e.g. through the console), which means not only are they not practicing DevOps they aren’t even managing their infrastructure in a way that will ever be compatible with DevOps… yikes.  The good news is that tools like Terraform allow you to import existing manually deployed infrastructure where it can then be managed as code and handled as “immutable infrastructure.”  Manually deploying anything is a DevOps anti-pattern and must be avoided at all costs.

Aside from infrastructure we had several questions around application development and deployment as it pertains to DevOps.  Testing code appears to be an area where a majority of respondents are staying proactive in a way that would be beneficial to a DevOps practice.  The question “What is your approach to writing tests?” had the following breakdown on its answers:

  • We don’t really test:  10.90%
  • We get to it if/when we have time:  15.20%
  • We require some percentage of code to be covered by tests before it is ready for production:  32.10%
  • We require comprehensive unit and integration testing before code is pushed to production:  31.10%
  • Rigid TDD/BDD/ATDD/STDD approach – write tests first & develop code to meet those test requirements:  10.70%

We can see that around 75% of respondents are doing some form of consistent testing, which will go a long way in helping build out a DevOps practice, but a staggering 25% of respondents have little or no testing of code in place today (ouch!).  Another question “How is application code deployed and managed?” shows that around 30% of respondents are using a completely manual process for application deployment and the remaining 70% are using some form of an automated pipeline.  Again, the 70% is a positive sign for those wanting to embrace DevOps, but there is still a massive chunk at 30% who will have to build out automation around testing, building, and deploying code.

Another important factor in managing services the DevOps way is to have all your environments mirror each other.  In response to the question “How well do your environments (e.g. dev, test, prod) mirror one another?” around 28% of respondents indicated that their environments are managed completely independently of each other.  Another 47% indicated that “they share some portion of code but are not managed through identical code bases and processes,” and the remaining 25% are doing it properly by “managed identically using same code & processes employing variables to differentiate environments.”  Lots of room for improvement in this area when organizations decide they are ready to embrace the DevOps way.

Our last question in the survey was “How are you notified when an application/process/system fails?” and I found the answers a bit staggering.  Over 21% of respondents indicated that they are notified of outages by the end user.  It’s pretty surprising to see that large of a percentage utilizing such a reactionary method of service monitoring.  Another 32% responded that “someone in operation is watching a dashboard,” which isn’t as surprising but will definitely be something that needs to be addressed when shifting to a DevOps approach.  Another 23% are using third-party tools like NewRelic and Pingdom to monitor their apps.  Once again, we have that savvy ~25% group who are currently operating in a way that bodes well for DevOps adoption by answering “Monitoring is built into the pipeline, apps and infrastructure. Notifications are sent immediately.”  The twenty-five-percenters are definitely on the right path if they aren’t already practicing DevOps today.

In summary, we have been able to deduce from our survey that, at best, around 25% of the respondents are actually engaging in a DevOps practice today. For more details on the results of our survey, download our infographic.

-Ryan Kennedy, Principal Cloud Automation Architect

Facebooktwittergoogle_pluslinkedinmailrss

Governance, Risk and Compliance – Drive Change Across the Organization

Governance, Risk and Compliance (GRC) is a standard framework that helps to drive organizations towards a common set of goals and principals. The overarching theme is strategically focused on how technology utilization and operations tie directly back to an organization’s business goals and, in many cases, aspirations.

There are many facets to GRC. In the cloud it means the same thing as it did in the datacenter. We need to ensure IT organizes around the business, and we need to make sure risk is minimized and compliance is maintained.

At 2nd Watch we work with clients across all areas of GRC. Clients take various levels of focus in each area, and some areas are more important based on the vertical the client is operating in.

The cloud extends beyond the physical bounds of an organization, and with that institutes new challenges and requires a shared cloud responsibility model. The CSP is responsible for the underlying infrastructure setup and physical maintenance of their cloud infrastructure. We work with our cloud ISV and providers’ tools, technologies and best practices to help maintain strong governance and lower risk while meeting compliance.

The landscape of software, tools and solutions to support governance, risk and compliance is daunting in the cloud marketplace. 2nd Watch focuses on providing a holistic support to our clients around GRC. We believe there are fantastic capabilities directly inside the cloud management portals to help customers along the journey to strong GRC framework and institution.

In Microsoft Azure we can utilize Compliance Manager. Compliance Manager is a workflow-based assessment tool that enables organizations to track, assign and verify regulatory compliance procedures and activities in support of Microsoft Cloud technologies – including Office 365 and Dynamics. It supports ISO 27001, IS0 27018 and NIST and supports regulatory compliance around HIPAA and GDPR.  It is a foundational tool to utilize within Microsoft Azure to help you along the path to achieving strong governance, risk and compliance around Microsoft Cloud technologies.

With Amazon Web Services we have a complete set of core cloud operations management tools to utilize within the AWS console to help us bolster governance and security and reduce risk. Amazon provides resources with a full prescriptive set of compliance quick reference guides, which provide an overview of how to maintain a cloud compliant environment through strong security and controls validation, and insight and monitoring for activity and security assurance.

Amazon has a complete Cloud Compliance Center where clients can tap into an abundant set of resources to help along the way.

Beyond the tools, both Microsoft Azure and AWS provide strategic support with partners around compliance. There are many accelerators and programs that organizations can request from and Amazon and Microsoft to help them achieve and maintain GRC specifically tuned to the cloud.

GRC is unique to each organization. Cloud providers bring a substantial set of resources and technologies, along with great prescriptive guidance and best practices to help and guide you in achieving a strategic GRC framework and set of processes and procedures in your organization.

Take advantage of these built-in capabilities as you start to look at other tools and technologies to complete your holistic approach to governance, risk and compliance, and please reach out to 2nd Watch to find out how we can support you along the way.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss

How to Use AWS IAM with STS for access to AWS resources

With increased focus on security and governance in today’s digital economy, I want to highlight a simple but important use case that demonstrates how to use AWS Identity and Access Management (IAM) with Security Token Service (STS) to give trusted AWS accounts access to resources that you control and manage.

Security Token Service is an extension of IAM and is one of several web services offered by AWS that does not incur any costs to use.  But, unlike IAM, there is no user interface on the AWS console to manage and interact with STS. Rather all interaction is done entirely through one of several extensive SDKs or directly using common HTTP protocol.  I will be using Terraform to create some simple resources in my sandbox account and .NET Core SDK to demonstrate how to interact with STS.

The main purpose and function of STS is to issue temporary security credentials for AWS resources to trusted and authenticated entities.  These credentials operate identically to the long-term keys that typical IAM users have, with a couple of special characteristics:

  • They automatically expire and become unusable after a short and defined period of time elapses
  • They are issued dynamically

These characteristics offer several advantages in terms of application security and development and are useful for cross-account delegation and access.  STS solves two problems for owners of AWS resources:

  • Meets the IAM best-practices requirement to regularly rotate access keys
  • You do not need to distribute access keys to external entities or store them within an application

One common scenario where STS is useful involves sharing resources between AWS accounts.  Let’s say, for example, that your organization captures and processes data in S3, and one of your clients would like to push large amounts of data from resources in their AWS account to an S3 bucket in your account in an automated and secure fashion.

While you could create an IAM user for your client, your corporate data policy requires that you rotate access keys on a regular basis, and this introduces challenges for automated processes.  Additionally, you would like to limit the distribution of access keys to your resources to external entities.  Let’s use STS to solve this!

To get started, let’s create some resources in your AWS cloud.  Do you even Terraform, bro?

Let’s create a new S3 bucket and set the bucket ACL to be private, meaning nobody but the bucket owner (that’s you!) has access.  Remember that bucket names must be unique across all existing buckets, and they should comply with DNS naming conventions.  Here is the Terraform HCL syntax to do this:

Great! We now have a bucket… but for now, only the owner can access it.  This is a good start from a security perspective (i.e. “least permissive” access).

What an empty bucket may look like

Let’s create an IAM role that, once assumed, will allow IAM users with access to this role to have permissions to put objects into our bucket.  Roles are a secure way to grant trusted entities access to your resources.  You can think about roles in terms of a jacket that an IAM user can wear for a short period of time, and while wearing this jacket, the user has privileges that they wouldn’t normally have when they aren’t wearing it.  Kind of like a bright yellow Event Staff windbreaker!

For this role, we will specify that users from our client’s AWS account are the only ones that can wear the jacket. This is done by including the client’s AWS account ID in the Principal statement.  AWS Account IDs are not considered to be secret, so your client can share this with you without compromising their security.  If you don’t have a client but still want to try this stuff out, put your own AWS account ID here instead.


Great, now we have a role that our trusted client can wear.  But, right now our client can’t do anything except wear the jacket.  Let’s give the jacket some special powers, such that anyone wearing it can put objects into our S3 bucket.  We will do this by creating a security policy for this role.  This policy will specify what exactly can be done to S3 buckets that it is attached to. Then we will attach it to the bucket we want our client to use.  Here is the Terraform syntax to accomplish this:

A couple things to note about this snippet – First, we are using Terraform interpolation to inject values from previous terraform statements into a couple of places in the policy – specifically the ARN from the role and bucket we created previously.  Second, we are specifying a condition for the s3 policy – one that requires a specific object ACL for the action s3:PutObject, which is accomplished by including the HTTP request header x-amz-acl to have a value of bucket-owner-full-control with the PUT object request.  By default, objects PUT in S3 are owned by the account that created them, even if it is stored in someone else’s bucket.  For our scenario, this condition will require your client to explicitly grant ownership of objects placed in your bucket to you, otherwise the PUT request will fail.

So, now we have a bucket, a policy in place on our bucket, and a role that assumes that policy.  Now your client needs to get to work writing some code that will allow them to assume the role (wear the jacket) and start putting objects into your bucket.  Your client will need to know a couple of things from you before they get started:

  1. The bucket name and the region it was created in (the example above created a bucket named d4h2123b9-xaccount-bucket in us-west-2)
  2. The ARN for the role (Terraform can output this for you). It will look something like this but will have your actual AWS Account ID: arn:aws:iam::123456789012:role/sts-delegate-role

They will also need to create an IAM User in their account and attach a policy allowing the user to assume roles via STS.  The policy will look similar to this:


Let’s help your client out a bit and provide some C# code snippets for .NET Core 2.0 (available for Windows, macOS and LinuxTo get started, install the .NET SDK for your OS, then fire up a command prompt in a favorite directory and run these commands:

The first command will create a new console app in the subdirectory s3cli.  Then switch context to that directory and import the AWS SDK for .NET Core, and then add packages for SecurityToken and S3 services.
Once you have the libraries in place, fire up your favorite IDE or text editor (I use Visual Studio Code), then open Program.cs and add some code:

This snippet sends a request to STS for temporary credentials using the specified ARN.  Note that the client must provide IAM user credentials to call STS, and that IAM user must have a policy applied that allows it to assume a role from STS.

This next snippet takes the STS credentials, bucket name, and region name, and then uploads the Program.cs file that you’re editing and assigns it a random key/name.  Also note that it explicitly applies the Canned ACL that is required by the sts-delegate-role:

So, to put this all together, run this code block and make the magic happen!  Of course, you will have to define and provide proper variable values for your environment, including  securely storing your credentials.

Try it out from the command prompt:

If all goes well, you will have a copy of Program.cs in the bucket. Not very useful itself, but it illustrates how to accomplish the task.

What a bucket with something in it may look like

Here is a high-level document of what we put together:

Putting it all together

Steps:

  1. Your client uses their IAM user to call AWS STS and requests the role ARN you gave them
  2. STS authenticates the client’s IAM user and verifies the policy for the ARN role, then issues a temporary credential to the client.
  3. The client can use the temporary credentials to access your S3 bucket (they will expire soon), and since they are now wearing the Event Staff jacket, they can successfully PUT stuff in your bucket!

There are many other use-cases for STS. This is just one very simplistic example. However, with this brief introduction to the concepts, you should now have a decent idea of how STS works with IAM roles and policies, and how you can use STS to give access to your AWS resources for trusted entities. For more tips like this, contact us.

-Jonathan Eropkin, Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss

Managing Azure Cloud Governance with Resource Policies

I love an all you can eat buffet. One can get a ton of value from a lot to choose from, and you can eat as much as you want or not, for a fixed price.

In the same regards, I love the freedom and vast array of technologies that the cloud allows you. A technological all you can eat buffet, if you will. However, there is no fixed price when it comes to the cloud. You pay for every resource! And as you can imagine, it can become quite costly if you are not mindful.

So, how do organizations govern and ensure that their cloud spend is managed efficiently? Well, in Microsoft’s Azure cloud you can mitigate this issue using Azure resource policies.

Azure resource policies allow you to define what, where or how resources are provisioned, thus allowing an organization to set restrictions and enable some granular control over their cloud spend.

Azure resource policies allow an organization to control things like:

  • Where resources are deployed – Azure has more than 20 regions all over the world. Resource policies can dictate what regions their deployments should remain within.
  • Virtual Machine SKUs – Resource policies can define only the VM sizes that the organization allows.
  • Azure resources – Resource policies can define the specific resources that are within an organization’s supportable technologies and restrict others that are outside the standards. For instance, your organization supports SQL and Oracle databases but not Cosmos or MySQL, resource policies can enforce these standards.
  • OS types – Resource policies can define which OS flavors and versions are deployable in an organization’s environment. No longer support Windows Server 2008, or want to limit the Linux distros to a small handful? Resource policies can assist.

Azure resource policies are applied at the resource group or the subscription level. This allows granular control of the policy assignments. For instance, in a non-prod subscription you may want to allow non-standard and non-supported resources to allow the development teams the ability to test and vet new technologies, without hampering innovation. But in a production environment standards and supportability are of the utmost importance, and deployments should be highly controlled. Policies can also be excluded from a scope. For instance, an application that requires a non-standard resource can be excluded at the resource level from the subscription policy to allow the exception.

A number of pre-defined Azure resource policies are available for your use, including:

  • Allowed locations – Used to enforce geo-location requirements by restricting which regions resources can be deployed in.
  • Allowed virtual machine SKUs – Restricts the virtual machines sizes/ SKUs that can be deployed to a predefined set of SKUs. Useful for controlling costs of virtual machine resources.
  • Enforce tag and its value – Requires resources to be tagged. This is useful for tracking resource costs for purposes of department chargebacks.
  • Not allowed resource types – Identifies resource types that cannot be deployed. For example, you may want to prevent a costly HDInsight cluster deployment if you know your group would never need it.

Azure also allows custom resource policies when you need some restriction not defined in a custom policy. A policy definition is described using JSON and includes a policy rule.

This JSON example denies a storage account from being created without blob encryption being enabled:

{
 
"if": {
 
"allOf": [
 
{
 
"field": "type",
 
"equals": "Microsoft.Storage/ storageAccounts"
 
},
 
{
 
"field": "Microsoft.Storage/ storageAccounts/ enableBlobEncryption",
 
"equals": "false"
 
}
 
]
 
},
 
"then": { "effect": "deny"
 
}
 
}

The use of Azure Resource Policies can go a long way in assisting you to ensure that your organization’s Azure deployments meet your governance and compliance goals. For more information on Azure Resource Policies visit https://docs.microsoft.com/en-us/azure/azure-policy/azure-policy-introduction.

For help in getting started with Azure resource policies, contact us.

-David Muxo, Sr Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss

Corrupted Stolen CPU Time

There is a feature in the Linux Kernel that is relevant to VM’s hosted on Xen servers that is called the “steal percentage.”  When the OS requests from the host system’s use of the CPU and the host CPU is currently tied up with another VM, the Xen server will send an increment to the guest Linux instance which increases the steal percentage.  This is a great feature as it shows exactly how busy the host system is, and it is a feature available on many instances of AWS as they host using Xen.  It is actually said that Netflix will terminate an AWS instance when the steal percentage crosses a certain threshold and start it up again, which will cause the instance to spin up in a new host server as a proactive step to ensure their system is utilizing their resources to the fullest.

What I wanted to discuss here is that it turns out there is a bug in the Linux kernel versions 4.8, 4.9 and 4.10 where the steal percentage can be corrupted during a live migration on the physical Xen server, which causes the CPU utilization to be reported as 100% by the agent.

When looking at Top you will see something like this:

As you can see in the screen shot of Top, the %st metric on the CPU(s) line shows an obviously incorect number.

During a live migration on the physical Xen server, the steal time gets a little out of sync and ends up decrementing the time.  If the time was already at or close to zero, itcauses the time to become negative and, due to type conversions in the code, it causes an overflow.

CloudWatch’s CPU Utilization monitor calculates that utilization by adding the System and User percentages together.  However, this only gives a partial view into your system.  With our agent, we can see what the OS sees.

That is the Steal percentage spiking due to that corruption.  Normally this metric could be monitored and actioned as desired, but with this bug it causes noise and false positives.  If Steal were legitimately high, then the applications on that instance would be running much slower.

There is some discussion online about how to fix this issue, and there are some kernel patches to say “if the steal time is less than zero, just make it zero.”  Eventually this fix will make it through the Linux releases and into the latest OS version, but until then it needs to be dealt with.

We have found that a reboot will clear the corrupted percentage.  The other option is to patch the kernel… which also requires a reboot.  If a reboot is just not possible at the time, the only impact to the system is that it makes monitoring the steal percentage impossible until the number is reset.

It is not a very common issue, but due to the large number of instances we monitor here at 2nd Watch, it is something that we’ve come across frequently enough to investigate in detail and develop a process around.

If you have any questions as to whether or not your servers hosted in the cloud might be effected by this issue, please contact us to discuss how we might be able to help.

-James Brookes, Product Manager

Facebooktwittergoogle_pluslinkedinmailrss

Full Disclosure Vlog: Dispelling the Myths of DevOps

Welcome to Full Disclosure, our new video blog series with expert technical tips and tricks for navigating the world of the cloud.

In our first vlog we discuss what DevOps is, what the origins of the method and term are and how you can start. At 2nd Watch, we define DevOps as a cultural change. Learn more about the myths of DevOps with Lars Cromley, 2nd Watch Director of Engineering.

Facebooktwittergoogle_pluslinkedinmailrss

CI/CD for Infrastructure as Code with Terraform and Atlantis

In this post, we’ll go over a complete workflow for continuous integration (CI) and continuous delivery (CD) for infrastructure as code (IaC) with just 2 tools: Terraform, and Atlantis.

What is Terraform?

So what is Terraform? According to the Terraform website:

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions.

In practice, this means that Terraform allows you to declare what you want your infrastructure to look like – in any cloud provider – and will automatically determine the changes necessary to make it so. Because of its simple syntax and cross-cloud compatibility, it’s 2nd Watch’s choice for infrastructure as code.

Pain You May Be Experiencing Working With Terraform

When you have multiple collaborators (individ

uals, teams, etc.) working on a Terraform codebase, some common problems are likely to emerge:

  1. Enforcing peer review becomes difficult. In any codebase, you’ll want to ensure that your code is peer reviewed in order to ensure better quality in accordance with The Second Way of DevOps: Feedback. The role of peer review in IaC codebases is even more important. IaC is a powerful tool, but that tool has a double-edge – we are clearly more productive for using it, but that increased productivity also means that a simple typo could potentially cause a major issue with production infrastructure. In order to minimize the potential for bad code to be deployed, you should require peer review on all proposed changes to a codebase (e.g. GitHub Pull Requests with at least one reviewer required). Terraform’s open source offering has no facility to enforce this rule.
  1. Terraform plan output is not easily integrated in code reviews. In all code reviews, you must examine the source code to ensure that your standards are followed, that the code is readable, that it’s reasonably optimized, etc. In this aspect, reviewing Terraform code is like reviewing any other code. However, Terraform code has the unique requirement that you must also examine the effect the code change will have upon your infrastructure (i.e. you must also review the output of a terraform plan command). When you potentially have multiple feature branches in the review process, it becomes critical that you are assured that the terraform plan output is what will be executed when you run terraform apply. If the state of infrastructure changes between a run of terraform plan and a run of terraform apply, the effect of this difference in state could range from inconvenient (the apply fails) to catastrophic (a significant production outage). Terraform itself offers locking capabilities but does not provide an easy way to integrate locking into a peer review process in its open source product.
  1. Too many sets of privileged credentials. Highly-privileged credentials are often required to perform Terraform actions, and the greater the number principals you have with privileged access, the higher your attack surface area becomes. Therefore, from a security standpoint, we’d like to have fewer sets of admin credentials which can potentially be compromised.

What is Atlantis?

And what is Atlantis? Atlantis is an open source tool that allows safe collaboration on Terraform projects by making sure that proposed changes are reviewed and that the proposed change is the actual change which will be executed on your infrastructure. Atlantis is compatible (at the time of writing) with GitHub and Gitlab, so if you’re not using either of these Git hosting systems, you won’t be able to use Atlantis.

How Atlantis Works With Terraform

Atlantis is deployed as a single binary executable with no system-wide dependencies. An operator adds a GitHub or GitLab token for a repository containing Terraform code. The Atlantis installation process then adds hooks to the repository which allows communication to the Atlantis server during the Pull Request process.

You can run Atlantis in a container or a small virtual machine – the only requirement is that the Terraform instance can communicate with both your version control (e.g. GitHub) and infrastructure (e.g. AWS) you’re changing. Once Atlantis is configured for a repository, the typical workflow is:

  1. A developer creates a feature branch in git, makes some changes, and creates a Pull Request (GitHub) or Merge Request (GitLab).
  2. The developer enters atlantis plan in a PR comment.
  3. Via the installed web hooks, Atlantis locally runs terraform plan. If there are no other Pull Requests in progress, Atlantis adds the resulting plan as a comment to the Merge Request.
    • If there are other Pull Requests in progress, the command fails because we can’t ensure that the plan will be valid once applied.
  4. The developer ensures the plan looks good and adds reviewers to the Merge Request.
  5. Once the PR has been approved, the developer enters atlantis apply in a PR comment. This will trigger Atlantis to run terraform apply and the changes will be deployed to your infrastructure.
    • The command will fail if the Merge Request has not been approved.

The following sequence diagram illustrates the sequence of actions described above:

Atlantis sequence diagram

We can see how our pain points in Terraform collaboration are addressed by Atlantis:

  1. In order to enforce code review, you can launch Atlantis with the –require approvals flag: https://github.com/runatlantis/atlantis#approvals
  2. In order to ensure that your terraform plan accurately reflects the change to your infrastructure that will be made when you run terraform apply, Atlantis performs locking on a project or workspace basis: https://github.com/runatlantis/atlantis#locking
  3. In order to prevent creating multiple sets of privileged credentials, you can deploy Atlantis to run on an EC2 instance with a privileged IAM role in its instance profile (e.g. in AWS). In this way, all of your Terraform commands run through a single set of privileged credentials and obviate the need to distribute multiple sets of privileged credentials: https://github.com/runatlantis/atlantis#aws-credentials

Conclusion

You can see that with minimal additional infrastructure you can establish a safe and reliable CI/CD pipeline for your infrastructure as code, enabling you to get more done safely! To find out how you can deploy a CI/CD pipeline in less than 60 days, Contact Us.

-Josh Kodroff, Associate Cloud Consultant

Facebooktwittergoogle_pluslinkedinmailrss

Utilizing Amazon Systems Manager to ensure systems are securely configured and maintained

Compliance is a constant challenge today. Keeping our system images in a healthy and trusted state of compliance requires time and effort. There are millions of tools and technologies in market to help customers maintain compliance and state, so where do I start?

Amazon has built a rich set of core technologies within the Amazon Web Services console. Systems Manager is a fantastic operations management platform tool that can assist you with setting up and maintaining configuration and state management.

One of the first things we must focus on when we build out our core images in the cloud is the configuration of those images. What is the role of the image, what operating system am I going to utilize and what applications and/or core services do I need to enable, configure and maintain?  In the datacenter, we call these Gold Images. The same applies in the cloud.

We define these roles for our images and we place them in different functional areas – Infrastructure, Web Services, Applications. We may have many core image templates for our enterprise workloads, by building these base images and maintaining them continuously – we set in motion a solid foundation for core security and core compliance of our cloud environment.

Amazon Systems Manager looks across my cloud environment and allows me to bring together all the key information around all my operating resources in the cloud. It allows me to centralize the gathering of all core baseline information for my resources in one place. In the past I would have had to look at my AWS CloudWatch information in one area, my AWS CloudTrail information in another area and my configuration information in yet another area. Centralizing this information in one console allows you to see the holistic state of your cloud environment baselines in one console.

AWS Systems Manager provides built-in Insight and Dashboards that allow you to look across your entire cloud environment and see into and act upon your cloud resources. AWS Systems Manager allows you to see the configuration compliance of all your resources as well as the state management and associations across your resources. It provides a rich ability to customize configuration and state management for your workloads, applications and resource types and scan and analyze to ensure those configuration and states are maintained continuously. With AWS Systems Manager you can customize and create your own compliance types to marry to your Enterprise Organizational baseline of your company’s business requirements. With that in place, I can constantly scan and analyze against these compliance baselines to ensure and maintain the operational configuration and state always.

We analyze and report on the current state and quickly determine compliance or out of compliance state centrally for our cloud services and resources. We can create base reports around our compliance position at any time, and with this knowledge, we can set in motion remediation to return our services and resources back to a compliant state and configuration.

With Amazon Systems Manager we can scan all resources for patch state, determine what patches are missing and manually, scheduled or automate the remediation of those patches to maintain patch management compliance.

Amazon Systems Manager also integrates with Chef InSpec, allowing you to leverage Chef InSpec to operate in a continuous compliance framework for your cloud resources.

On the road to compliance it is important to flex the tools and capabilities of your Cloud Provider. Amazon gives us a rich set of Systems Management capabilities across configuration, state management, patch management and remediation, as well as reporting. Amazon Systems Manager is provided at no cost to Amazon customers and will help you along your Journey to realizing continuous compliance of your cloud environment across the Amazon Cloud and the Hybrid Cloud. To learn more about using Amazon Systems Manager or your systems’ compliance, contact us.

-Peter Meister, Sr Director of Product Management

Facebooktwittergoogle_pluslinkedinmailrss