1-888-317-7920 info@2ndwatch.com

AWS re:Invent 2017 Recap and Initial Impressions

While AWS re:Invent 2017 is still fresh in our minds, here are some of the highlights of the most significant announcements.

Aurora Multi-Master/Multi-Region: This is a big deal! The concept of geographically distributed databases with multiple masters has been a long-desired solution. Why is this important?
Having additional masters allows for database writes, not just reads like the traditional read replicas that have been available. This feature enables a true multi-region, highly available solution that eliminates a single point of failure and achieves optimum performance. Previously, third party tools like Golden Gate and various log shipping approaches were required to accomplish proper disaster recovery and high availability. This will greatly simplify architectures for some that want to go active-active across regions and not just availability zones. Additionally, it will enable pilot light (and more advanced) DR scenarios for customers that are not going to be using active-active configurations.

Aurora Serverless: Aurora Serverless is an on-demand, auto-scaling configuration for the Aurora MySQL and PostgresSQL compatible database service, where the database will automatically start-up and scale up or down based on your application’s capacity needs. It will shut down when required, basically scaling down to zero when not being used. Traditionally, Aurora RDS required changing the underlying instance type to scale for database demand. This is a large benefit and cost saver for development, testing, and QA environments. Even more importantly, if your workload has large spikes in demand, then auto-scaling is a game changer in the same way that EC2 auto scaling enabled automated compute flexibility.

T2 Unlimited: T2 is one of the most popular instance types used by 2nd Watch and AWS customers, accounting for around 50% of all instances under 2nd Watch Managed Cloud Services. In the case of frequent, small and inconsistent workloads, T2 is the best price and performance option. However, one of the most common reasons that customers do not heavily leverage T2 is due to concerns related to a sustained spike in load that will deplete burstable credits and result in unrecoverable performance degradation. T2 unlimited solves this problem by essentially allowing unlimited surges over the former limits. We expect to see more customers will adopt T2 for those inconsistent workloads as a cost-effective solution. We will watch to see if this this shift is reflected in the instance type data for accounts being managed by 2nd Watch.

Spot Capacity: Spot instances are normally used as pools of compute that run standard AMIs and work on datasets located outside of EC2. This is because the instances are terminated when the spot price increases beyond your bid, and all data is lost. Now, when AWS reclaims the capacity, the instance can essentially hibernate, preserving the operating system and data, and startup again when the spot pricing is favorable. This removes another impediment in the use of spot capacity, and will be a large cost saver for environments that only need to be temporarily available.

M5 Instance Type: Given the large increase in performance of the newer processor generations, one can see large cost savings and performance improvements by migrating to a smaller sized offering of the latest instance type that meets your application’s needs. Newer instance types can also offer higher network bandwidth as well, so don’t put off the adoption of the latest products if possible.

Inter-region Peering: It’s always been possible to establish peering relationships between VPCs in the same region. Inter-region Peering uses AWS private links between VPCs in different availability zones and does not transit the open internet, eliminating VPNs, etc. This same feature is available inter-region. This makes multi-region designs cleaner and easier to implement, without having to build and configure VPN networking infrastructure to support it, which of course also needs monitoring, patching, and other maintenance. It was also announced that users of Direct Connect can now route traffic to almost every AWS region from a single Direct Connect circuit.

There were also some announcements that we found interesting but need to digest a little longer. Look for a follow up from us on these.

EKS: Elastic Container Services for Kubernetes – Amazon Elastic Container Service for Kubernetes (Amazon EKS) is a managed service that makes it easy for you to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes clusters. Even at last years’ AWS re:Invent we heard people wondering where the support for Kubernetes was, particularly since it has become the de facto industry standard over the past several years.

GuardDuty: AWS has now added a cloud-native tool to the security toolbox. This tool utilizes “machine learning” for anomaly detection. AWS GuardDuty monitors traffic flow and API logs for your accounts, letting you establish a baseline for “normal” behavior on your infrastructure, and then watches for security anomalies. These are reported with a severity rating, and remediation for certain types of events can be automated using existing AWS tools. We will be considering the best methods of implementation of this new tool.

Fargate: Run Amazon EKS and ECS without having to manage servers or clusters.

Finally, a shameless plug: If compliance is on your mind, watch this AWS re:Invent breakout session from our product and engineering experts.

AWS re:invent 2017: Continuous Compliance on AWS at Scale (SID313)

Peter Meister, Director of Product Management, 2nd Watch
Lars Cromley, Director of Engineering, 2nd Watch

In cloud migrations, the cloud’s elastic nature is often touted as a critical capability in delivering on key business initiatives. However, you must account for it in your security and compliance plans or face some real challenges. Always counting on a virtual host to be running, for example, causes issues when that host is rebooted or retired. Managing security and compliance in the cloud is continuous, requiring forethought and automation. Learn how a leading, next generation managed cloud provider uses automation and cloud expertise to manage security and compliance at scale in an ever-changing environment. Through code examples and live demos, we show tools and automation to provide continuous compliance of your cloud infrastructure.
Obviously, there was a lot more going on and it will take some time to go through it. We will keep you up to date with our thoughts.

–David Nettles, Solutions Architect, 2nd Watch
–Kevin Dillon, Director, Solutions Architecture, 2nd Watch


2nd Watch Meets Customer Demands and Prepares for Continued Growth and Acceleration with Amazon Aurora

The Product Development team at 2nd Watch is responsible for many technology environments that support our software and solutions—and ultimately, our customers. These environments need to be easily built, maintained, and kept in sync. In 2016, 2nd Watch performed an analysis on the amount of AWS billing data that we had collected and the number of payer accounts we had processed over the course of the previous year.  Our analysis showed that these measurements had more than tripled from 2015 and projections showed that we will continue to grow at the same, rapid pace with AWS usage and client onboarding increasing daily. Knowing that the storage of data is critical for many systems, our Product Development team underwent an evaluation of the database architecture used to house our company’s billing data—a single SQL Server instance running a Web edition of SQL Server with the maximum number of EBS volumes attached.

During the evaluation, areas such as performance, scaling, availability, maintenance and cost were considered and deemed most important for future success. The evaluation revealed that our current billing database architecture could not meet the criteria laid out to keep pace with growth.  Considerations were made to increase the storage capacity by one VM to the maximum family size or potentially upgrade to MS SQL Enterprise. In either scenario, the cost of the MS SQL instance doubled.  The only option for scaling without substantially increasing our cost was to scale vertically, however, to do so would result in diminishing performance gains. Maintenance of the database had become a full-time job that was increasingly difficult to manage.

Ultimately, we chose the cloud-native solution, Amazon Aurora, for its scalability, low-risk, easy-to-use technology.  Amazon Aurora is a MySQL relational database that provides speed and reliability while being delivered at a lower cost. It offers greater than 99.99% availability and can store up to 64TB of data. Aurora is self-healing and fully managed, which, along with the other key features, made Amazon Aurora an easy choice as we continue to meet the AWS billing usage demands of our customers and prepare for future growth.

The conversion from MS SQL to Amazon Aurora was successfully completed in early 2017 and, with the benefits and features that Amazon Aurora offers, many gains were made in multiple areas. Product Development can now reduce the complexity of database schemas because of the way Aurora stores data. For example, a database with one hundred tables and hundreds of stored procedures was reduced to one table with 10 stored procedures. Gains were made in performance as well. The billing system produces thousands of queries per minute and Amazon Aurora handles the load with the ability to scale to accommodate the increasing number of queries. Maintenance of the Amazon Aurora system is now virtually managed. Tasks such as database backups are automated without the complicated task of managing disks. Additionally, data is copied across six replicas in three availability zones which ensures availability and durability.

With Amazon Aurora, every environment is now easily built and setup using Terraform. All infrastructure is automatically setup—from the web tier to the database tier—with Amazon CloudWatch logs to alert the company when issues occur. Data can easily be imported using automated processes and even anonymized if there is sensitive data or the environment is used to demo to our customers. With the conversion of our database architecture from a single MS SQL Service instance to Amazon Aurora, our Product Development team can now focus on accelerating development instead of maintaining its data storage system.






Benchmarking Amazon Aurora

Amazon Web Services, Microsoft Azure and Google Cloud Platform are the clear leaders in the Cloud Service Provider (CSP) space. The competition between these players drives innovation and results in customers receiving more benefits and better business outcomes over older technologies. The database as a service space is a massive opportunity for customers to gain more in terms of performance, scalability, and high-availability without the overhead or long-term contracts from traditional solutions.

At 2nd Watch we see a lot of customers who want to utilize database as a service to get better performance without the overhead or having all the hassles of traditional database products.  In that, we watch all three Cloud Service Providers very closely, their services and run benchmark comparisons on their products. In our opinion, Amazon Aurora is probably the best-performing database as a service platform that we’ve seen to date, while still being very cost-effective for the performance benefits.

Amazon Aurora is AWS’s fully-managed MySQL-compatible relational database service. Aurora was built to provide commercial-grade performance and availability, while being as easy to use and as cost-effective as an open source engine. Naturally, a lot of people are interested in Aurora’s performance, and a number of people have tried to benchmark it. Benchmarking accurately is not always easy to do, and it’s why you see different results from various benchmarks.  We’ve helped many of our customers benchmark Aurora to understand how it would perform to meet their needs, and in many cases we have migrated, or are actively working on migrating, customers to Aurora.

Recently, Google benchmarked their own Cloud SQL data against Amazon Aurora. Since a number of our customers use Aurora, we were interested in digging in to take a deeper look at their findings.

The chart below shows the benchmark results published by Google.

Graph 1

Original Source: Google. See: https://cloudplatform.googleblog.com/2016/08/Cloud-SQL-Second-Generation-performance-and-feature-deep-dive.html

In their blog post about this comparison, the blog claims that 1) Cloud SQL performs better at low thread counts and 2) customers do not use a lot of threads, so the results at higher thread counts do not matter. Our experience is that the latter claim is not true, especially as we look at our enterprise customers.  So, we questioned our results from previous benchmarking. Therefore, we decided to benchmark things again to see if our results would fall in line with Google’s results. Below you can see an overlay of our data on top of their original graph. Since we do not have the original results from Google’s , an overlay is the best we can do for comparison.

To perform our own benchmarks, we used Terraform to stand up a VPC, our instances of Aurora, and the hosts from which we’d run the sysbench s, all within the same Availability Zone.  We used the same settings and commands as outlined in the Google blog post to against our results.  We shared our results with the Aurora team at AWS, and they confirmed that they saw very similar results when they ed Aurora with sysbench in a configuration similar to ours.

Full data from our s, R scripts, and automation to stand up the infrastructure can be found on github.

Here is a summary of our observations on this study:

1.The data for Aurora’s performance is incorrect

This data does not match the Aurora performance we see when we run the ourselves. Without any special tuning, we saw higher performance results for Aurora. Without the original data from the s Google conducted it is difficult to say why their results differ. All we can say is that, from our ing, Aurora does appear to be more performant and is clearly the database of choice, especially when dealing with higher thread counts.

2nd Watch overlay in purple:

Graph 2

Results from our s on Aurora:

Both studies agree that Aurora has a significant performance advantage above a certain thread count. Google’s shows the previous release of Aurora having an advantage at 16 threads or more (Cloud SQL starts higher but then drops below Aurora). Our shows the la version of Aurora having an advantage at 8 threads, with results between the two databases being fairly comparable at 4 or fewer threads.

2.The study makes a claim that customers don’t need many threads

Google is suggesting that you should only focus on the results with lower thread counts. Their blog post says that Cloud SQL “performed better than Aurora when active thread count is low, as is typical for many web applications.” The claim that low thread counts are typical for web applications (and thus what people should focus on) is inaccurate. There are a large number of AWS customers who run applications using Aurora, and it is very common for them to run with hundreds of threads (and many customers choose to run with thousands). As we mentioned previously, both our study and Google’s study show that Aurora has an advantage at higher thread counts.

There are a number of areas where Amazon improved on the performance of MySQL when developing Aurora. One of them was taking advantage of high thread counts to deliver better performance. With Aurora, customers with higher thread counts will typically see a large performance advantage over MySQL and Cloud SQL.

3.Aurora’s largest instance outperforms Cloud SQL’s largest instance by 3X

Google’s benchmark was done on an r3.4XL, which is only half the size of Aurora’s largest instance (the r3.8XL). We understand that this study used Cloud SQL’s largest machine and that they were trying to compare to a comparable Aurora instance size. But a customer is more likely to run a workload like the one in Google’s benchmark on Aurora’s largest machine, the r3.8XL, because the 8XL’s larger cache would improve performance. We ran the with Aurora’s r3.8XL instance, and the results were more than 3X higher than Cloud SQL’s performance. It was a struggle to fit Aurora’s r3.8XL performance on the same scale.

Aurora on r3.8xlarge (scale adjusted):

Graph 3

A few more things to consider

We’ve already pointed out Aurora’s performance advantage, which is evident in both studies. Aurora has a number of other advantages over Google Cloud SQL. One example is scalability. We showed earlier that Aurora’s largest instance exceeds the performance of Cloud SQL’s largest instance by 3X in this benchmark. Aurora can scale up to handle more storage (64 TB for Aurora vs 10 TB for Cloud SQL). Aurora supports up to 15 low-latency read replicas that have very low replica lag. Aurora replica lag is typically only a few milliseconds, whereas traditional binlog-based replication, which is used by MySQL and Cloud SQL, can result in replica lag on the order of seconds or even minutes. Aurora allows up to 15 replicas that can act as failover targets without impacting the performance of the primary instance, whereas Cloud SQL allows only one failover target. Another advantage is that Aurora has extremely fast crash recovery, as its log-structured storage system does not require the replay of redo logs after a database crash.

Benchmarking Aurora

If you’re interested in benchmarking Aurora, check out AWS’s Aurora benchmarking guide here.

If you’re thinking about benchmarking Aurora yourself, there are two common errors to watch out for. First, make sure you put the database and the client in the same Availability Zone or you will pay a latency and throughput penalty. This is what customers actually do when they use Aurora in production. Second, make sure that the client instance driving traffic to Aurora has enhanced networking enabled. The benchmarking guide mentioned above has instructions for how to do that.

Making good use of benchmarking data is tricky. You’re trying to build or migrate your application and you want to understand how your new database is likely to behave, now and in the future. There’s no magic benchmark that will exactly match your production workload, but you need to make a decision anyway. What should you do?

Here’s some general advice we try to follow for our own projects as well.

  1. Plan for success by ing at scale: The last thing you want to do is make design choices that hurt you when your application starts to really take off. That’s when you want to celebrate with your team and build the next great feature, not attempt an emergency database migration. Think about how your workload might change as your application grows (higher thread counts, more data, higher TPS, more tables…) and build that into your s.
  2. Choose benchmarking s that align to your application: Do you expect your data set to fit in memory or will your database be disk-bound? Will your traffic pattern be spiky? Write-heavy? You can find a large number of benchmarks for any major technology. Try to identify a set of s that are as close as possible to your real-world application. Don’t just accept the first benchmarking study you find. If you can re-run your actual production workload as a , even better.
  3. Know that your benchmarks are not the real world and plan for that: The best benchmarking study you run will be, at best, slightly wrong (not the same as the real world). Make a note of the assumptions you made in your and keep an eye on your database in production. Watch for signs that your production workload is moving into uned areas and adjust your s accordingly.
  4. Get help: The best way to get expert advice is to see an expert. Cloud technology is complex and not something you should seek guidance on from partners who are “generalists.” To Aurora at your company, visit us at www.2ndwatch.com.


-Chris Nolan, Director of Product & Lars Cromley, Sr. Product Manager


Introducing Amazon Aurora

When you think of AWS, the first thing that comes to mind is scalability, high availability, and performance. AWS is now bringing the same features to a relational database by offering Aurora through Amazon RDS. As announced at AWS re:Invent 2014, Amazon Aurora is a fully managed relational database engine that is feature compatible with MySQL 5.6. Though it is currently in preview, it promises to give you the performance and availability of high end commercially available databases with built-in scalability without the administrative overhead.

To create Aurora, AWS has taken a service-oriented approach, making individual layers such as storage, logging and caching scale out while keeping the SQL and transaction layers using the core MySQL engine.

So, let’s talk about storage. Aurora can automatically add storage volume in 10 GB chunks up to 64 TB. This eliminates the need to plan for the data growth upfront and manually intervene when storage needs to be added. This feature is described by AWS to happen seamlessly without any downtime or performance hit.

Aurora’s storage volume is automatically replicated across three AWS Availability Zones (AZ) with two copies of data in each AZ.  Aurora uses quorum writes and reads across these copies. It can handle the loss of two copies of data without affecting database write availability and up to 3 copies without affecting read availability. This SSD powered multi-tenant 10 GB storage chunks allows for concurrent access and reduces hot spots making it fault tolerant. It is also self-healing as it continuously scans data blocks for errors and repairs them automatically.

Unlike in a traditional database where it has to replay the redo log since last check point, which is single threaded in MySQL, Aurora’s underlying storage replays redo logs on-demand, in parallel, distributed and asynchronous mode. This allows you to recover from a crash almost instantaneously.

Aurora database can have up to 15 Aurora replicas. Since master and replica share the same underlying storage, you can failover with no data loss. Also, there’s very little load on the master since there’s no log replay, resulting in minimal replica lag of approximately 10 to 20 milliseconds. The cache lives outside of the database process which allows it to survive during a database restart. As a result, operations can be resumed much faster as there is no need to warm the cache. The backup is automatic, incremental and continuous. This enables point-in-time recovery up to the last five minutes.

A new added feature of Aurora is its ability to simulate failure of node, disk or networking components using SQL commands. This allows you to the high availability and scaling features of your application.

According to the Aurora FAQ, it is five times faster than a stock version of MySQL. Using Sysbench “Aurora delivers over 500,000 selects/sec and 100,000 updates/sec running the same benchmark on the same hardware.”

While Aurora may cost slightly more than the RDS for MySQL per instance based on the US-EAST-1, the features may justify it, and you only pay for the storage consumed at a rate of $0.10/GB per month and IOs at a rate of $0.20 per million requests.

It’s exciting to see the challenges of scaling the relational database being addressed by AWS. To learn more, you can sign up for a preview at http://aws.amazon.com/rds/aurora/preview/.

2nd Watch specializes in helping our customers with legacy investments in Oracle achieve successful migrations to Aurora.  Our Oracle-certified experts understand the specialized features, high availability and performance capabilities that proprietary components such as RAC, ASM and Data Guard provide and are skilled in delivering low-risk migration roadmaps.  We also offer schema and PL/SQL migration guidance, client tool validation, bulk data upload and ETL integration services. To learn more, contact us.

Ali Kugshia, Sr. Cloud Engineer