1-888-317-7920 info@2ndwatch.com

AWS re:Invent Keynote Recap – Wednesday

I have been looking forward to Andy Jassy’s keynote since I arrived in Las Vegas. Like the rest of the nearly 50k cloud-geeks in attendance, I couldn’t wait to learn about all of the cool new services and feature enhancements that will be unleashed that can solve problems for our clients, or inspire us to challenge convention in new ways.

Ok, I’ll admit it. I also look forward to the drama of the now obligatory jabs at Oracle, too!

Andy’s 2017 keynote was no exception to the legacy of previous re:Invents on those counts, but my takeaway from this year is that AWS has been able to parlay their flywheel momentum of growth in IaaS to build a wide range of higher-level managed services. The thrill I once got from new EC2 instance type releases has given way to my excitement for Lambda and event-based computing, edge computing and IoT, and of course AI/ML!

AWS Knows AI/ML

Of all the topics covered in the keynote, the theme that continues to resonate throughout this conference for me is that AWS wants people to know that they are the leader in AI and machine learning. As an attendee, I received an online survey from Amazon prior to the conference asking for my opinion on AWS’s position as a leader in the AI/ML space. While I have no doubts that Amazon has unmatched compute and storage capacity, and certainly has access to a wealth of information to train models, how does one actually measure a cloud provider’s AI/ML competency? Am I even qualified to answer without an advanced math degree?

That survey sure makes a lot more sense to me following the keynote as I now have a better idea of what “heavy lifting” a cloud provider can offload from the traditional process.

Amazon has introduced SageMaker, a fully managed service that enables data scientists and developers to quickly and easily build, train, and deploy machine learning models at any scale. It integrates with S3, and with RDS, DynamoDB, and Redshift by way of AWS Glue. It provides managed Jupyter notebooks and even comes supercharged with several common ML algorithms that have been tuned for “10x” performance!

In addition to SageMaker, we were introduced to Amazon Comprehend, a natural language processing (NLP) service that uses machine learning to analyze text. I personally am excited to integrate this into future chatbot projects, but the applications I see for this service are numerous.

After you’ve built and trained your models, you can run them in the cloud, or with the help of AWS Greengrass and its new machine learning inference feature, you can bring those beauties to the edge!

What is a practical application for running ML inference at the edge you might ask?

Dr. Matt Wood demoed a new hardware device called DeepLens for the audience that does just that! DeepLens is a deep-learning enabled wireless video camera specifically designed to help developers of all skill levels grow their machine learning skills through hands-on computer vision tutorials. Not only is this an incredibly cool device to get to hack around with, but it signals Amazon’s dedication to raising the bar when it comes to AI and machine learning by focusing on the wet-ware: hungry minds looking to take their first steps.

Andy’s keynote included much more than just AI/ML, but to me, the latest AI/ML services that were announced on Tuesday represent the signal of Amazon’s future of higher-level services which will keep them the dominant cloud provider into the future.

 

–Joe Conlin, Solutions Architect, 2nd Watch

Facebooktwittergoogle_pluslinkedinmailrss

2nd Watch Enterprise Cloud Expertise On Display at AWS re:Invent 2017

AWS re:Invent is less than twenty days away and 2nd Watch is proud to be a 2017 Platinum Sponsor for the sixth consecutive year.  As an Amazon Web Services (AWS) Partner Network Premier Consulting Partner, we look forward to attending and demonstrating the strength of our cloud design, migration, and managed services offerings for enterprise organizations at AWS re:Invent 2017 in Las Vegas, Nevada.

About AWS re:Invent

Designed for AWS customers, enthusiasts and even cloud computing newcomers, the nearly week-long conference is a great source of information and education for attendees of all skill levels. AWS re:Invent is THE place to connect, engage, and discuss current AWS products and services via breakout sessions ranging from introductory and advanced to expert as well as to hear the latest news and announcements from key AWS executives, partners, and customers. This year’s agenda offers a full additional day of content for even more learning opportunities, more than 1,000 breakout sessions, an expanded campus, hackathons, boot camps, hands-on labs, workshops, expanded Expo hours, and the always popular Amazonian events featuring broomball, Tatonka Challenge, fitness activities, and the attendee welcome party known as re:Play.

2nd Watch at re:Invent 2017

 2nd Watch has been a Premier Consulting Partner in the AWS Partner Network (APN) since 2012 and was recently named a leader in Gartner’s Magic Quadrant for Public Cloud Infrastructure Managed Service Providers, Worldwide (March 2017). We hold AWS Competencies in Financial Services, Migration, DevOps, Marketing, and Commerce, Life Sciences and Microsoft Workloads, and have recently completed the AWS Managed Service Provider (MSP) Partner Program Audit for the third year in a row. Over the past decade, 2nd Watch has migrated and managed AWS deployments for companies such as Crate & Barrel, Condé Nast, Lenovo, Motorola, and Yamaha.

The 2nd Watch breakout session—Continuous Compliance on AWS at Scale—will be led by cloud security experts Peter Meister and Lars Cromley. The session will focus on the need for continuous security and compliance in cloud migrations, and attendees will learn how a managed cloud provider can use automation and cloud expertise to successfully control these issues at scale in a constantly changing cloud environment. Registered re:Invent Full Conference Pass holders can add the session to their agendas here.

In addition to our breakout session, 2nd Watch will be showcasing our customers’ successes in the Expo Hall located in the Sands Convention Center (between The Venetian and The Palazzo hotels).  We invite you to stop by booth #1104 where you can explore 2nd Watch’s Managed Cloud Solutions, pick up a coveted 2nd Watch t-shirt and find out how you can win one of our daily contest giveaways—a totally custom 2nd Watch skateboard!

Want to make sure you get time with one of 2nd Watch’s Cloud Journey Masters while at re:Invent?  Plan ahead and schedule a meeting with one of 2nd Watch’s AWS Professional Certified Architects, DevOps, or Engineers.  Last but not least, 2nd Watch will be hosting its annual re:Invent after party on Wednesday, November 29. If you haven’t RSVP’d for THE AWS re:Invent Partner Party, click here to request your invitation (Event has passed)

AWS re:Invent is sure to be a week full of great technical learning, networking, and social opportunities.  We know you will have a packed schedule but look forward to seeing you there!  Be on the lookout for my list of “What to Avoid at re:Invent 2017” in the coming days…it’s sure to help you plan for your trip and get the most out of your AWS re:Invent experience.

 

–Katie Laas-Ellis, Marketing Manager, 2nd Watch

 

Gartner Disclaimer

Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

About 2nd Watch

2nd Watch is an AWS Premier tier Partner in the AWS Partner Network (APN) providing managed cloud to enterprises. The company’s subject matter experts, software-enabled services and cutting-edge solutions provide companies with tested, proven, and trusted solutions, allowing them to fully leverage the power of the cloud. 2nd Watch solutions are high performing, robust, increase operational excellence, decrease time to market, accelerate growth and lower risk. Its patent-pending, proprietary tools automate everyday workload management processes for big data analytics, digital marketing, line-of-business and cloud native workloads. 2nd Watch is a new breed of business which helps enterprises design, deploy and manage cloud solutions and monitors business critical workloads 24×7. 2nd Watch has more than 400 enterprise workloads under its management and more than 200,000 instances in its managed public cloud. The venture-backed company is headquartered in Seattle, Washington. To learn more about 2nd Watch, visit www.2ndwatch.com or call 888-317-7920.

Facebooktwittergoogle_pluslinkedinmailrss

Best Practices: Developing a Tag Strategy for AWS Billing

Tag Strategy is key to Cost Allocation for Cloud Applications.

Have you ever been out to dinner with a group of friends and at the end of the dinner the waiter comes back with one bill?  Most of us have experienced this.  Depending on the group of friends it’s not a big deal and everyone drops in a credit card so the bill can be split evenly.  Other times, someone invites Harry and Sally, and they scrutinize the bill line by line.  Inevitably they protest that they only had one glass of wine and Sally only had the salad.  You recall that Sally was  a little ‘handsy’ with the sampler platter, but you sit quietly.  It’s in that moment you remember, that’s why the group didn’t include Harry and Sally to last year’s New Year’s dinner.  No need to start the new year with an audit, am I right?

This situation can be eerily similar in many ways to cloud billing in a large enterprise.  The fact that Amazon Web Services (AWS) has changed the way that an organization uses computing resources is evident.  However, AWS has also delivered on the promise of truly enabling ‘chargeback’ or ‘showback’ in the enterprise so that the business units themselves are stakeholders in what was traditionally silo’d in an IT Department budget.

Now multiple stake holders from many organizations have a stake in the cost and usage of an app that resides in AWS.  Luckily there are tools like 2nd Watch’s Cloud Management Platform (CMP) that can easily provide visibility to the cost of their app, or even what their entire infrastructure is costing them at the click of a button.

2nd Watch’s CMP tools are great for showing an organization’s costs and can even be used to set budget notifications so that the business unit doesn’t inadvertently spend more than is budgeted on an environment.   CMP is a powerful tool that can deliver powerful insights to your business and can be made more powerful by implementing a thorough tagging strategy.

Tag your it…

We live in a world of tags and hashtags.  Seemingly overnight tags have made their way into everyday language.  This is not by accident as cloud interactions with Facebook and Twitter have become so commonplace, they have altered the world’s language.

Beyond their emergence in our everyday vernacular, they have a key function. In AWS, applying tags to various cloud resources like EC2 and RDS is key to having quality accounting for allocating charges.  Our team of experts at 2nd Watch can work with you to ensure that your tagging strategy is implemented in the most effective manner for your organization.  After all, a tagging strategy can and will vary by organization.  It depends on you and how you want to be able to report on your resources.  Do you want to be able to report on your resources used by cost center, application, environment type (like dev or prod), owner, department, geographic area, or if this resource was managed by a managed service provider like 2nd Watch?

Without having a well thought out tagging strategy your invoicing discussions will sound much like the fictional dinner described above.  Who pays for what and why?

Tag Strategy and Hygiene…

Implementing a sound tagging strategy at the outset when a resource or environment is deployed is the first step.  At the inception it’s important to know some “gotchas” that can derail a tagging implementation.  One of these is that tags are case sensitive.  For example, mktg will report separately from Mktg.  Also keep in mind, that in today’s ever changing business environment organizations are forced to adjust and reorganize themselves to stay competitive.

Revisiting your tagged resource strategy will need to be done from time to time to ensure tag relevance.  If a stake holder moves out of a role, gets promoted, or retires from the organization altogether, you will need to stay on top of the tagging for their environment to be sure that it is still relevant to the new organization.

What about the un-taggables?

Having standardization and a tag plan works great for AWS resources like EC2 and RDS as explained before.  What about untaggable resources, Network transfer charges, and items like a NAT gateway or a VPC Endpoint?   There will be shared resources like these in your applications environment. It is best to review these shared untagged resources early on, and decide where to best allocate that cost.

At 2nd Watch, we have these very discussions with our clients on a regular basis. We can easily guide them through the resources associated with the app and where to allocate each cost.  With a tool like CMP we can configure a client’s cost allocation hierarchy so they can view their ongoing costs in real time.

For it’s part, Amazon does a great job providing an up-to-date user guide for what resources can be tagged.  Click here for great reference documentation to help while you develop your tag strategy.

Rinse and repeat as necessary

Your tagging strategy can’t be a ‘fire and forget’ pronouncement.  To be effective your organization will need to enforce it on a consistent basis. For instance, as new devops personnel are brought into an organization, it will be key to insuring it stays under control.

These are the types of discussions that 2nd Watch adds a lot of value to.  Our cloud expertise in AWS for large enterprises will insure that you are able to precisely account for your cloud infrastructure spend at the click of a button through CMP.

After all, we all want to enjoy our meal and move on with the next activity. Stop by and visit us at re:Invent booth #1104 for more help.

— Paul Wells, Account Manager, 2nd Watch

Facebooktwittergoogle_pluslinkedinmailrss

Migrating to Terraform v0.10.x

When it comes to managing cloud-based resources, it’s hard to find a better tool than Hashicorp’s Terraform. Terraform is an ‘infrastructure as code’ application, marrying configuration files with backing APIs to provide a nearly seamless layer over your various cloud environments. It allows you to declaratively define your environments and their resources through a process that is structured, controlled, and collaborative.

One key advantage Terraform provides over other tools (like AWS CloudFormation) is having a rapid development and release cycle fueled by the open source community. This has some major benefits: features and bug fixes are readily available, new products from resource providers are quickly incorporated, and you’re able to submit your own changes to fulfill your own requirements.

Hashicorp recently released v0.10.0 of Terraform, introducing some fundamental changes in the application’s architecture and functionality. We’ll review the three most notable of these changes and how to incorporate them into your existing Terraform projects when migrating to Terraform v.0.10.x.

  1. Terraform Providers are no longer distributed as part of the main Terraform distribution
  2. New auto-approve flag for terraform apply
  3. Existing terraform env commands replaced by terraform workspace

A brief note on Terraform versions:

Even though Terraform uses a style of semantic versioning, their ‘minor’ versions should be treated as ‘major’ versions.

1. Terraform Providers are no longer distributed as part of the main Terraform distribution

The biggest change in this version is the removal of provider code from the core Terraform application.

Terraform Providers are responsible for understanding API interactions and exposing resources for a particular platform (AWS, Azure, etc). They know how to initialize and call their applications or CLIs, handle authentication and errors, and convert HCL into the appropriate underlying API calls.

It was a logical move to split the providers out into their own distributions. The core Terraform application can now add features and release bug fixes at a faster pace, new providers can be added without affecting the existing core application, and new features can be incorporated and released to existing providers without as much effort. Having split providers also allows you to update your provider distribution and access new resources without necessarily needing to update Terraform itself. One downside of this change is that you have to keep up to date with features, issues, and releases of more projects.

The provider repos can be accessed via the Terraform Providers organization in GitHub. For example, the AWS provider can be found here.

Custom Providers

An extremely valuable side-effect of having separate Terraform Providers is the ability to create your own, custom providers. A custom provider allows you to specify new or modified attributes for existing resources in existing providers, add new or unsupported resources in existing providers, or generate your own resources for your own platform or application.

You can find more information on creating a custom provider from the Terraform Provider Plugin documentation.

1.1 Configuration

The nicest part of this change is that it doesn’t really require any additional modifications to your existing Terraform code if you were already using a Provider block.

If you don’t already have a provider block defined, you can find their configurations from the Terraform Providers documentation.

You simply need to call the terraform init command before you can perform any other action. If you fail to do so, you’ll receive an error informing you of the required actions (img 1a).

After successfully reinitializing your project, you will be provided with the list of providers that were installed as well as the versions requested (img 1b).

You’ll notice that Terraform suggests versions for the providers we are using – this is because we did not specify any specific versions of our providers in code. Since providers are now independently released entities, we have to tell Terraform what code it should download and use to run our project.

(Image 1a: Notice of required reinitialization)

Picture1a

 

 

 

 

 

 

 

 

(Image 1b: Response from successful reinitialization)

Picture1b

 

 

 

 

 

 

 

 

Providers are released separately from Terraform itself, and maintain their own version numbers.

You can specify the version(s) you want to target in your existing provider blocks by adding the version property (code block 1). These versions should follow the semantic versioning specification (similar to node’s package.json or python’s requirements.txt).

For production use, it is recommended to limit the acceptable provider versions to ensure that new versions with breaking changes are not automatically installed.

(Code Block 1: Provider Config)

provider "aws" {
  version = "0.1.4"
  allowed_account_ids = ["1234567890"]
  region = "us-west-2"
}

 (Image 1c: Currently defined provider configuration)

Picture1c

 

 

 

 

 

 

 

 

2. New auto-approve flag for terraform apply

In previous versions, running terraform apply would immediately apply any changes between your project and saved state.

Your normal workflow would likely be:
run terraform plan followed by terraform apply and hope nothing changed in between.

This version introduced a new auto-approve flag which will control the behavior of terraform apply.

Deprecation Notice

This flag is set to true to maintain backwards compatibility, but will quickly change to false in the near future.

2.1 auto-approve=true (current default)

When set to true, terraform apply will work like it has in previous versions.

If you want to maintain this functionality, you should upgrade your scripts, build systems, etc now as this default value will change in a future Terraform release.

(Code Block 2: Apply with default behavior)

# Apply changes immediately without plan file
terraform apply --auto-approve=true

2.2 auto-approve=false

When set to false, Terraform will present the user with the execution plan and pause for interactive confirmation (img 2a).

If the user provides any response other than yes, terraform will exit without applying any changes.

If the user confirms the execution plan with a yes response, Terraform will then apply the planned changes (and only those changes).

If you are trying to automate your Terraform scripts, you might want to consider producing a plan file for review, then providing explicit approval to apply the changes from the plan file.

(Code Block 3: Apply plan with explicit approval)

# Create Plan
terraform plan -out=tfplan

# Apply approved plan
terraform apply tfplan --auto-approve=true

(Image 2a: Terraform apply with execution plan)

Picture2a

 

 

 

 

 

 

3. Existing terraform env commands replaced by terraform workspace

The terraform env family of commands were replaced with terraform workspace to help alleviate some confusion in functionality. Workspaces are very useful, and can do much more than just split up environment state (which they aren’t necessarily used for). I recommend checking them out and seeing if they can improve your projects.

There is not much to do here other than switch the command invocation, but the previous commands still currently work for now (but are deprecated).

 

License Warning

You are using an UNLICENSED copy of Scroll Office.

Do you find Scroll Office useful?
Consider purchasing it today: https://www.k15t.com/software/scroll-office

 

— Steve Byerly, Principal SDE (IV), Cloud, 2nd Watch

Facebooktwittergoogle_pluslinkedinmailrss

Current IoT Security Threat Landscape

By Paul Fletcher, Alert Logic

The “Internet of Things” (IoT) is a broadly accepted term which basically describes any Internet-connected devices (usually via Wi-Fi) that isn’t a traditional computer system.  These connected, IoT devices offer many conveniences for everyday life.  Also, it’s difficult to remember how life was before you could check email, weather and stream live video using a smart TV.  It’s now considered commonplace for a smart refrigerator to send you a text every morning with an updated shopping list.  We can monitor and manage the lights, thermostat, doors, locks and web cameras from wherever we may roam, thanks to smartphone apps and the proliferation of our connected devices.

With this added convenience comes a larger digital footprint, which makes for a larger target for attackers to discover other systems on your network, steal data or seize control of your DVR.  The hacker community is just getting warmed up in regards to attacking IoT devices.  There are a lot of fun things hackers can do with vulnerable connected devices and/or “smart” homes.  The early attacks were just about exploring, hackers would simulate ghosts by having all the lights in the house go on and off in a pattern, turn the heater on during the summer and the air conditioning in the winter or make the food inside the fridge go bad with the change of a few temperature levels.

The current IoT security threat landscape has grown more sophisticated recently and we’ve seen some significant attacks.  The most impactful IoT-based cyber attack happened on Oct. 21, 2016, when a hacker group activated 10% of their IoTBotNet, with malware called “Mirai.”  Approximately 50,000 web cameras and DVR systems launched a massive DDoS attack on the Dyn DNS Service, disrupting Internet services for companies like Spotify, Twitter, Github and others for more than 8 hours.  The attackers only used 10% of the 500,000 DVR’s and Web Camera’s infected by the malware, but cause monetary damage to customers of the Dyn DNS service.  A few months later, attackers launched a new IoT-specific malware called “Persirai” that infected over 100,000 web cameras.  This new malware comes complete with a sleek detection avoidance feature.  Once the malware executes on the web cam it only runs in the RAM memory space and deletes the original infection file, making it extremely difficult to detect.

The plain, cold truth is that most IoT manufacturers use stripped down versions of the Linux (and possibly Android) operating system, because the OS requires minimal system resources to operate.  ALL IoT devices have some version of an operating system and are therefore; “lightweight” computers.  Since most IoT devices are running some form of Linux or Android operating system, this means that they have vulnerabilities that are researched and discovered on an on-going basis.  So, yes, it’s possible that you may have to install a security patch for your refrigerator or coffee maker.

Special-purpose computer systems with customized versions of operating systems have been around for decades.  The best example of this is old school arcade games or early gaming consoles.  The difference today is that these devices now come with fast, easy connectivity to your internal network and the Internet.  Most IoT manufacturers don’t protect the underlying operating system on their “smart” devices and consumers shouldn’t assume it’s safe to connect a new device to their network.  Both Mirai and Persirai compromised IoT devices using simple methods like default usernames and passwords.  Some manufacturers feel like their devices are so “lightweight” that their limited computing resources (hard drive, RAM etc.) wouldn’t be worth hacking, because they wouldn’t provide much firepower for an attacker.  The hacking community repeatedly prove that they are interested in ANY resource (regardless of capacity) they can leverage.

When an IoT device is first connected to your network (either home or office), it will usually try to “call home” for software updates and/or security patches.  It’s highly recommended that all IoT devices be placed on an isolated network segment and blocked from the enterprise or high valued home computer systems.  It’s also recommended to monitor all outbound Internet traffic from your “IoT” network segment to discern a baseline of “normal” behavior.  This helps you better understand the network traffic generated from your IoT devices and any “abnormal” behavior could help discover a potential attack.

Remember “hackers gonna hack,” meaning the threat is 24/7. IoT devices need good computer security hygiene, just like your laptop, smartphone and tablet.  Make sure you use unique and easily remembered passwords and make sure to rotate all passwords regularly.  Confirm that all of your systems are using the la patches and upgrades for better functionality and security.  After patches are applied, validate your security settings haven’t been changed back to the default settings.

IoT devices are very convenient and manufacturers are getting better at security, but with the ever-changing IoT threat landscape we can expect to see more impactful and sophisticated attack in the near future.  The daily burden of relevant operational security for an organization or household is no easy task and IoT devices are just one of the many threats that require on-going monitoring.  It’s highly recommended that IoT cyber threats be incorporated into a defense in depth strategy as a holistic approach to cyber security.

Learn more about 2nd Watch Managed Cloud Security and how our partnership with Alert Logic can ensure your environment’s security.

Blog Contributed by 2nd Watch Cloud Security Partner, Alert Logic

AlertLogic_Logo_2C_RGB_V_Tag

Facebooktwittergoogle_pluslinkedinmailrss

Introduction to High Performance Computing (HPC)

When we talk about high performance computing we are typically trying to solve some type of problem. These problems will generally fall into one of four types:

  • Compute Intensive – A single problem requiring a large amount of computation.
  • Memory Intensive – A single problem requiring a large amount of memory.
  • Data Intensive – A single problem operating on a large data set.
  • High Throughput – Many unrelated problems that are be computed in bulk.

 

In this post, I will provide a detailed introduction to High Performance Computing (HPC) that can help organizations solve the common issues listed above.

Compute Intensive Workloads

First, let us take a look at compute intensive problems. The goal is to distribute the work for a single problem across multiple CPUs to reduce the execution time as much as possible. In order for us to do this, we need to execute steps of the problem in parallel. Each process­—or thread—takes a portion of the work and performs the computations concurrently. The CPUs typically need to exchange information rapidly, requiring specialization communication hardware. Examples of these types of problems are those that can be found when analyzing data that is relative to tasks like financial modeling and risk exposure in both traditional business and healthcare use cases. This is probably the largest portion of HPC problem sets and is the traditional domain of HPC.

When attempting to solve compute intensive problems, we may think that adding more CPUs will reduce our execution time. This is not always true. Most parallel code bases have what we call a “scaling limit”. This is in no small part due to the system overhead of managing more copies, but also to more basic constraints.

CAUTION: NERD ALERT

This is summed up brilliantly in Amdahl’s law.

In computer architecture, Amdahl’s law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.

Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this reason, parallel computing with many processors is useful only for very parallelizable programs.

– Wikipedia

Amdahl’s law can be formulated the following way:

 

where

  • Slatency is the theoretical speedup of the execution of the whole task;
  • s is the speedup of the part of the task that benefits from improved system resources;
  • p is the proportion of execution time that the part benefiting from improved resources originally occupied.

 

Furthermore,

Chart Example: If 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 times.

Bottom line: As you create more sections of your problem that are able to run concurrently, you can split the work between more processors and thus, achieve more benefits. However, due to complexity and overhead, eventually using more CPUs becomes detrimental instead of actually helping.

There are libraries that help with parallelization, like OpenMP or Open MPI, but before moving to these libraries, we should strive to optimize performance on a single CPU, then make p as large as possible.

Memory Intensive Workloads

Memory intensive workloads require large pools of memory rather than multiple CPUs. In my opinion, these are some of the hardest problems to solve and typically require great care when building machines for your system. Coding and porting is easier because memory will appear seamless, allowing for a single system image.  Optimization becomes harder, however, as we get further away from the original creation date of your machines because of component uniformity. Traditionally, in the data center, you don’t replace every single server every three years. If we want more resources in our cluster, and we want performance to be uniform, non-uniform memory produces actual latency. We also have to think about the interconnect between the CPU and the memory.

Nowadays, many of these concerns have been eliminated by commodity servers. We can ask for thousands of the same instance type with the same specs and hardware, and companies like Amazon Web Services are happy to let us use them.

Data Intensive Workloads

This is probably the most common workload we find today, and probably the type with the most buzz. These are known as “Big Data” workloads. Data Intensive workloads are the type of workloads suitable for software packages like Hadoop or MapReduce. We distribute the data for a single problem across multiple CPUs to reduce the overall execution time. The same work may be done on each data segment, though not always the case. This is essentially the inverse of a memory intensive workload in that rapid movement of data to and from disk is more important than the interconnect. The type of problems being solved in these workloads tend to be Life Science (genomics) in the academic field and have a wide reach in commercial applications, particularly around user data and interactions.

High Throughput Workloads

Batch processing jobs (jobs with almost trivial operations to perform in parallel as well as jobs with little to no inter-CPU communication) are considered High Throughput workloads. In high throughput workloads, we create an emphasis on throughput over a period rather than performance on any single problem. We distribute multiple problems independently across multiple CPU’s to reduce overall execution time. These workloads should:

  • Break up naturally into independent pieces
  • Have little or no inter-cpu communcation
  • Be performed in separate processes or threads on a separate CPU (concurrently)

 

Workloads that are compute intensive jobs can likely be broken into high throughput jobs, however, high throughput jobs do not necessarily mean they are CPU intensive.

HPC On Amazon Web Services

Amazon Web Services (AWS) provides on-demand scalability and elasticity for a wide variety of computational and data-intensive workloads, including workloads that represent many of the world’s most challenging computing problems: engineering simulations, financial risk analyses, molecular dynamics, weather prediction, and many more.   

– AWS: An Introduction to High Performance Computing on AWS

Amazon literally has everything you could possibly want in an HPC platform. For every type of workload listed here, AWS has one or more instance classes to match and numerous sizes in each class, allowing you to get very granular in the provisioning of your clusters.

Speaking of provisioning, there is even a tool called CfnCluster which creates clusters for HPC use. CfnCluster is a tool used to build and manage High Performance Computing (HPC) clusters on AWS. Once created, you can log into your cluster via the master node where you will have access to standard HPC tools such as schedulers, shared storage, and an MPI environment.

For data intensive workloads, there a number of options to help get your data closer to your compute resources.

  • S3
  • Redshift
  • DynamoDB
  • RDS

 

EBS is even a viable option for creating large scale parallel file systems to meet high-volume, high-performance, and throughput requirements of workloads.

HPC Workloads & 2nd Watch

2nd Watch can help you solve complex science, engineering, and business problems using applications that require high bandwidth, enhanced networking, and very high compute capabilities.

Increase the speed of research by running high performance computing in the cloud and reduce costs by paying for only the resources that you use, without large capital investments. With 2nd Watch, you have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications. Contact us today to learn more.

2nd Watch Customer Success

Celgene is an American biotechnology company that manufactures drug therapies for cancer and inflammatory disorders. Read more about their cloud journey and how they went from doing research jobs that previously took weeks or months, to just hours. Read the case study.

We have also helped a global finance & insurance firm prove their liquidity time and time again in the aftermath of the 2008 recession. By leveraging the batch computing solution that we provided for them, they are now able to scale out their computations across 120,000 cores while validating their liquidity with no CAPEX investment. Read the case study.

 

– Lars Cromley, Director of Engineering, Automation, 2nd Watch

Facebooktwittergoogle_pluslinkedinmailrss