When we talk about high performance computing we are typically trying to solve some type of problem. These problems will generally fall into one of four types:
- Compute Intensive – A single problem requiring a large amount of computation.
- Memory Intensive – A single problem requiring a large amount of memory.
- Data Intensive – A single problem operating on a large data set.
- High Throughput – Many unrelated problems that are be computed in bulk.
In this post, I will provide a detailed introduction to High Performance Computing (HPC) that can help organizations solve the common issues listed above.
Compute Intensive Workloads
First, let us take a look at compute intensive problems. The goal is to distribute the work for a single problem across multiple CPUs to reduce the execution time as much as possible. In order for us to do this, we need to execute steps of the problem in parallel. Each process—or thread—takes a portion of the work and performs the computations concurrently. The CPUs typically need to exchange information rapidly, requiring specialization communication hardware. Examples of these types of problems are those that can be found when analyzing data that is relative to tasks like financial modeling and risk exposure in both traditional business and healthcare use cases. This is probably the largest portion of HPC problem sets and is the traditional domain of HPC.
When attempting to solve compute intensive problems, we may think that adding more CPUs will reduce our execution time. This is not always true. Most parallel code bases have what we call a “scaling limit”. This is in no small part due to the system overhead of managing more copies, but also to more basic constraints.
CAUTION: NERD ALERT
This is summed up brilliantly in Amdahl’s law.
In computer architecture, Amdahl’s law is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It is named after computer scientist Gene Amdahl, and was presented at the AFIPS Spring Joint Computer Conference in 1967.
Amdahl’s law is often used in parallel computing to predict the theoretical speedup when using multiple processors. For example, if a program needs 20 hours using a single processor core, and a particular part of the program which takes one hour to execute cannot be parallelized, while the remaining 19 hours (p = 0.95) of execution time can be parallelized, then regardless of how many processors are devoted to a parallelized execution of this program, the minimum execution time cannot be less than that critical one hour. Hence, the theoretical speedup is limited to at most 20 times (1/(1 − p) = 20). For this reason, parallel computing with many processors is useful only for very parallelizable programs.
Amdahl’s law can be formulated the following way:
- Slatency is the theoretical speedup of the execution of the whole task;
- s is the speedup of the part of the task that benefits from improved system resources;
- p is the proportion of execution time that the part benefiting from improved resources originally occupied.
Chart Example: If 95% of the program can be parallelized, the theoretical maximum speedup using parallel computing would be 20 times.
Bottom line: As you create more sections of your problem that are able to run concurrently, you can split the work between more processors and thus, achieve more benefits. However, due to complexity and overhead, eventually using more CPUs becomes detrimental instead of actually helping.
There are libraries that help with parallelization, like OpenMP or Open MPI, but before moving to these libraries, we should strive to optimize performance on a single CPU, then make p as large as possible.
Memory Intensive Workloads
Memory intensive workloads require large pools of memory rather than multiple CPUs. In my opinion, these are some of the hardest problems to solve and typically require great care when building machines for your system. Coding and porting is easier because memory will appear seamless, allowing for a single system image. Optimization becomes harder, however, as we get further away from the original creation date of your machines because of component uniformity. Traditionally, in the data center, you don’t replace every single server every three years. If we want more resources in our cluster, and we want performance to be uniform, non-uniform memory produces actual latency. We also have to think about the interconnect between the CPU and the memory.
Nowadays, many of these concerns have been eliminated by commodity servers. We can ask for thousands of the same instance type with the same specs and hardware, and companies like Amazon Web Services are happy to let us use them.
Data Intensive Workloads
This is probably the most common workload we find today, and probably the type with the most buzz. These are known as “Big Data” workloads. Data Intensive workloads are the type of workloads suitable for software packages like Hadoop or MapReduce. We distribute the data for a single problem across multiple CPUs to reduce the overall execution time. The same work may be done on each data segment, though not always the case. This is essentially the inverse of a memory intensive workload in that rapid movement of data to and from disk is more important than the interconnect. The type of problems being solved in these workloads tend to be Life Science (genomics) in the academic field and have a wide reach in commercial applications, particularly around user data and interactions.
High Throughput Workloads
Batch processing jobs (jobs with almost trivial operations to perform in parallel as well as jobs with little to no inter-CPU communication) are considered High Throughput workloads. In high throughput workloads, we create an emphasis on throughput over a period rather than performance on any single problem. We distribute multiple problems independently across multiple CPU’s to reduce overall execution time. These workloads should:
- Break up naturally into independent pieces
- Have little or no inter-cpu communcation
- Be performed in separate processes or threads on a separate CPU (concurrently)
Workloads that are compute intensive jobs can likely be broken into high throughput jobs, however, high throughput jobs do not necessarily mean they are CPU intensive.
HPC On Amazon Web Services
Amazon Web Services (AWS) provides on-demand scalability and elasticity for a wide variety of computational and data-intensive workloads, including workloads that represent many of the world’s most challenging computing problems: engineering simulations, financial risk analyses, molecular dynamics, weather prediction, and many more.
– AWS: An Introduction to High Performance Computing on AWS
Amazon literally has everything you could possibly want in an HPC platform. For every type of workload listed here, AWS has one or more instance classes to match and numerous sizes in each class, allowing you to get very granular in the provisioning of your clusters.
Speaking of provisioning, there is even a tool called CfnCluster which creates clusters for HPC use. CfnCluster is a tool used to build and manage High Performance Computing (HPC) clusters on AWS. Once created, you can log into your cluster via the master node where you will have access to standard HPC tools such as schedulers, shared storage, and an MPI environment.
For data intensive workloads, there a number of options to help get your data closer to your compute resources.
EBS is even a viable option for creating large scale parallel file systems to meet high-volume, high-performance, and throughput requirements of workloads.
HPC Workloads & 2nd Watch
2nd Watch can help you solve complex science, engineering, and business problems using applications that require high bandwidth, enhanced networking, and very high compute capabilities.
Increase the speed of research by running high performance computing in the cloud and reduce costs by paying for only the resources that you use, without large capital investments. With 2nd Watch, you have access to a full-bisection, high bandwidth network for tightly-coupled, IO-intensive workloads, which enables you to scale out across thousands of cores for throughput-oriented applications. Contact us today to learn more.
2nd Watch Customer Success
Celgene is an American biotechnology company that manufactures drug therapies for cancer and inflammatory disorders. Read more about their cloud journey and how they went from doing research jobs that previously took weeks or months, to just hours. Read the case study.
We have also helped a global finance & insurance firm prove their liquidity time and time again in the aftermath of the 2008 recession. By leveraging the batch computing solution that we provided for them, they are now able to scale out their computations across 120,000 cores while validating their liquidity with no CAPEX investment. Read the case study.
– Lars Cromley, Director of Engineering, Automation, 2nd Watch
The exponential growth of big data is pushing companies to process massive amounts of information as quickly as possible, which is often times not realistic, practical or down right just not achievable on standard CPI’s. In a nutshell, High Performance Computing (HPC) allows you to scale performance to process and report on the data quicker and can be the solution to many of your big data problems.
However, this still relies on your cluster capabilities. By using AWS for your HPC needs, you no longer have to worry about designing and adjusting your job to meet the capabilities of your cluster. Instead, you can quickly design and change your cluster to meet the needs of your jobs. There are several tools and services available to help you do this, like the AWS Marketplace, AWS API’s, or AWS CloudFormation Templates.
Today, I’d like to focus on one aspect of running an HPC cluster in AWS that people tend to forget about – placement groups.
Placement groups are a logical grouping of instances in a single availability zone. This allows you to take full advantage of a low-latency 10 GB network, which in turn will allow you to be able to transfer up to 4TB of data per hour between nodes. However, because of the low-latency 10 GB network, the placement groups cannot span to multiple availability zones. This may scare some people away from using them, but it shouldn’t. You can create multiple placement groups in different availability zones as a work-around, and with enhanced networking you can also still connect between the different HPC’s.
One of the grea benefits of AWS HPC is that you can run your High Performance Computing clusters with no up-front costs and scale out to hundreds of thousands of cores within minutes to meet your computing needs. Learn more about Big Data and HPC solutions on AWS or Contact Us to get started with a workload workshop.
-Shawn Bliesner, Cloud Architect
In previous posts, we’ve introduced the overall concept of the Digital Enterprise, characteristics of today’s digital businesses, key challenges that many companies face when adopting a cloud strategy and the steps that these organizations can take to help speed their transition to the public cloud.
Our recent white paper, “The Digital Enterprise: Transforming Business in the Cloud”, defines digital businesses as companies with 100% digital IT and business processes that, for the most part and as much as possible, are hosted in the public cloud. Additional characteristics of a digital enterprise include:
- The ability to easily scale computing resources and advanced features in order to meet demand.
- Faster time-to-market when launching new services, websites and web applications in support of business initiatives and market changes.
- The ability to innovate, , fail, and repeat at speeds that far exceed those of their competitors that still maintain traditional, workload management solutions.
- Management of systems is unified in the public cloud via a single management interface, which eliminates the complexities of multiple management systems and delivers higher cross-application reliability.
- Ease-of-interoperability due to open architecture that supports multiple software and hardware technologies.
- The focus of the IT department becomes innovation rather than required maintenance.
Companies like Coca-Cola and Yamaha have already begun their transformation to becoming Digital Enterprises by migrating critical applications and workloads to the public cloud. Core workloads and applications include short term, large scale batch computing and data analysis workloads, on-premises business applications (ERP systems, marketing, collaboration, sales, and accounting tools), application and development environments and applications which are built to be inherently “cloud” or “cloud native”.
As companies seek to streamline their core workloads and applications through digital migration, they quickly realize that central IT isn’t the only department that can benefit from the use of cloud technology. It’s no surprise that with today’s digital culture (and our need for instant gratification and expectations for wicked-fast responses, feedback and communication), marketing is a key sector that stands to benefit drastically from the capabilities of the cloud. Specifically, the drastic change in the ways in which enterprises can now reach and engage their audience and furthermore, scale globally to support thousands of websites and web apps as well as store, analyze and distribute mission-critical data, efficiently and securely.
I began my career as a high tech marketer more than a decade ago and have had the privilege of working for some of the biggest names in the industry. It all began at PeopleSoft as a Direct Response Campaign Manager, driving direct mail campaigns for the ERP powerhouse. Over the last fourteen years, my role as a marketer has transformed along with the methodologies that have emerged, changing the game as we know it.
It wasn’t much more than a decade ago that traditional “snail mail” was the norm for getting your business’ messaging into market and a standard campaign was a bit of a mathematical equation that resembled something like this:
The result? A sluggish response to market changes and data capture/analysis was manual (at best) and provided very little insight into campaign performance and ROI when compared to today’s digital marketing capabilities. The good news, however, was that competition followed the same methodologies and invested in the traditional, tried and true mediums of marketing and advertising.
I still remember the day that I launched my first, purely digital, email campaign. Within minutes we began seeing audience responses trickle in, and in the back of my mind I couldn’t help but think, “Whoa! Game changer!” It wasn’t long before newer methods of digital marketing emerged and traditional direct mail became a dinosaur on the verge of extinction.
Today, investments in digital marketing are a key component to a digital enterprise’s competitive advantage. The convergence of technology and marketing has never before been so tightly coupled, thus having a profound impact on overall marketing strategies. In fact, a 2013 U.S. Digital Marketing Spending Survey by Gartner reports that two-thirds of marketers have a capital expenditure budget that they are using to acquire marketing software licenses and infrastructure.
Marketers are looking for an easier, lower cost way to get the capacity they need to develop digital marketing campaigns when and where they need it.
Delivering engaging experiences requires real time, high performing architectures that provide marketers the ability to measure and improve the performance of their websites or campaigns and tie them more closely to overall corporate goals. The insights garnered from the massive amounts of data collected can then be used to dynamically adjust creative execution or content for optimal performance.
The secret to success when shifting your digital marketing efforts to the public cloud is to enable digital operations that drive real-time action plans to better serve your customers. Additionally, it is important to recognize that digital marketing campaign management isn’t the only area where digital enterprises are reaping the benefits of the public cloud.
Over the next few weeks, we will explore how businesses like Adobe, DVF and Magento are benefiting from an increased flexibility and agility to quickly respond to changes in the marketplace and scale their marketing operations globally—at a much lower cost than traditional methods—by running the following digital marketing efforts in the public cloud:
- Websites & Web Applications
- Dev & Test Environments
- Content Delivery
In today’s world, consumer habits change fast and marketing decisions need to be made within seconds, not days. Shifting your marketing operations to the public cloud enables you to deliver marketing content and campaigns with the levels of availability, performance, and personalization that your customers expect while lowering your costs and driving preferred business outcomes.
Learn from Gartner what the Critical Capabilities for Public Cloud Infrastructure as a Service were for 2014. Gartner evaluates 15 public cloud IaaS service providers, listed in the 2014 Magic Quadrant, against eight critical capabilities and for four common use cases your enterprise manages today.
Gartner takes an in-depth look at the critical capabilities for:
- Application Development – for the needs of large teams of developers building new applications
- Batch Computing – including high-performance computing (HPC), data analytics and other one-time (but potentially recurring), short-term, large-scale, scale-out workloads
- Cloud Native Applications – for applications at any scale, which have been written with the strengths and weaknesses of public cloud IaaS in mind
- General Business Applications – for applications not designed with the cloud in mind, but that can run comfortably in virtualized environments
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.