Business intelligence (BI) is an umbrella term that refers to a variety of software applications used to analyze an organization’s raw data. BI as a discipline is made up of several related activities including data mining, online analytical processing, querying and reporting. Analytics is the discovery and communication of meaningful patterns in data. This blog will look at a few areas of BI that will include data mining and reporting, as well as talk about using analytics to find the answers you need to make better business decisions.
Data Mining is an analytic process designed to explore data. Companies of all sizes continuously collect data, often times in very large amounts, in order to solve complex business problems. Data collection can range in purpose from finding out the types of soda your customers like to drink to tracking genome patterns. To process these large amounts of data quickly takes a lot of processing power, and therefore, a system such as Amazon Elastic MapReduce (EMR) is often needed to accomplish this. AWS EMR can handle most use cases from log analysis to bioinformatics, which are key when collecting data, but AWS EMR can only report on data that is collected, so make sure the collected data is accurate and complete.
Reporting accurate and complete data is essential for good BI. Tools like Splunk’s Hunk and Hive work very well with AWS EMR for modeling, reporting, and analyzing data. Hive is business intelligence software used for reporting meaningful patterns in the data, while Hunk helps interactively review logs with real-time alerts. Using the correct tools is the difference between data no one can use and data that provides meaningful BI.
Why do we collect all this data? To find answers of course! Finding answers in your data, from marketing data to application debugging, is why we collect the data in the first place. AWS EMR is great for processing all that data with the right tools reporting on that data. But more than knowing just what happened, we need to find out how it happened. Interactive queries on the data are required to drill down and find the root causes or customer trends. Tools like Impala and Tableau work great with AWS EMR for these needs.
Business Intelligence and Analytics boils down to collecting accurate and complete data. That includes having a system that can process that data, having the ability to report on that data in a meaningful way, and using that data to find answers. By provisioning the storage, computation and database services you need to collect big data into the cloud, we can help you manage big data, BI and analytics while reducing costs, increasing speed of innovation, and providing high availability and durability so you can focus on making sense of your data and using it to make better business decisions. Learn more about our BI and Analytics Solutions here.
-Brent Anderson, Senior Cloud Engineer
This past Valentine’s Day, Amazon Web Services launched a business intelligence and data warehousing service, dubbed Redshift, which has been in in a limited preview beta since last November. This is good news for customers plagued by internal data warehousing costs and complications, especially when trying to make sense of reams of Big Data results. Redshift has no problem handling Big Data for individual customers since the service supports petabyte-sized data warehouses in the AWS cloud.
Redshift’s value comes at you from two angles. First, it’s a data warehouse headache- and wallet-saver. Use Redshift and you’re no longer plagued by the infrastructure required to process a Big Data repository – massive CPU cycles and an ever-widening sinkhole of storage needs, plus a big increase in new in-house management tools and skill sets. Redshift takes that off your plate with managed services; automatic task help, including configuration and provisioning; and, of course, a lower overall TCO.
But possibly even more valuable than that is its capability as a business intelligence foundation. Now that it’s launched, AWS announced that Redshift has gotten support from a satisfyingly large number of Big Data management vendors, including Actuate, Attunity, Birst, IBM, Informatica, Jaspersoft, MicroStrategy, Pentaho, Pervasive, Roambi, SAP, Tableau, and Talend. All these companies offer a wide variety of business intelligence tool kits that include Big Data management, broad and vertical analytic engines, and formal as well as DIY querying features. AWS will get support from other Big Data management vendors as it rolls along, but for an out-of-the-gate launch, this is a great stable.
Pricing is a huge benefit when you consider the cost of running a Big Data warehouse yourself. Amazon summarized pricing on its site:
“For On-Demand, the effective price per TB per year is the hourly price for the instance times the number of hours in a year divided by the number of TB per instance. This works out to $3,723 per TB per year. For Reserved Instances, the effective price per TB per year is the upfront payment plus the hourly payment times the number of hours in the term divided by the number of years in the term and the number of TB per node. For 1 year Reserved Instances, this works out to $2,190 per TB per Year. For 3 year Reserved Instances, the effective price is $999 per TB per year.”
Redshift also includes some free backup if you’ve got a single, active XL node cluster, but anything over that gets charged at S3 rates. However, when you boil all that down, pricing is about 85 cents an hour for a 2TB node with cheaper pricing available depending on what kind of instance you’re running. Viewed annually, that’s about $1,000 a year per terabyte of data. Sounds like a lot, but running the same kind of managed storage in-house can cost upwards of 10x that much. That nasty price tag lies in Big Data’s complexity.
Big Data isn’t composed of one honking database that just grew too big for its britches. It’s usually comprised of several instances, often from different vendors, that have started growing very quickly or even exponentially, because of new and smarter data gathering tools. Web analytics, web or brick-and-mortar transaction monitoring, mobile and social marketing data – all of these have new tools that can gather more data points and send them back to their repositories much faster – almost constantly. That means that a Big Data installation is a mix of massive, always-growing databases upon which new business intelligence tools are attempting to make queries that access all those instances simultaneously. That requires an all-new set of management and querying tools as well as a newly educated staff with an understanding of Big Data and the expertise in turning an ocean of bytes into tangible intelligence.
Sure you can do this in-house, but by using a cloud service like Redshift, you can drop the heavy burden of infrastructure maintenance and concentrate on mining your Big Data for real insight. And that’s what it’s all about.
If you want to learn more about Redshift, AWS is hosting a free webinar on March 14 – you can register off the Redshift product page.
-Kris Bliesner, CEO