How Athena Reduces Cost of Data Analytics

April 2, 2021 / Deepak Dhall

Cloud Cost Optimization Data Engineering & Analytics

How Athena Reduces Cost of Data Analytics

Leveraging data has several challenges chief amongst which is storing it in a scalable, cost-effective manner. Amazon S3 solves that to a large extent by storing raw, unstructured data cost-effectively, but analyzing that data is expensive as customers have to set up huge clusters to process and store transformed data.

The traditional way of storing enterprise data entailed the use of database and data warehouse for raw and transformed data respectively, requiring expensive licenses and massive clusters. As volume and velocity of data grew scaling and maintaining these systems became extremely challenging and expensive.

AWS Cloud-based storage was an attractive option as storage and processing could be separated, yet cost and scalability challenges persisted as data transformation still entailed using massive clusters to store, transport and process. Running queries was slow as data had to be at a central location and moved to other storage options for scrubbing and transformation. Besides comprehensive data sets were elusive as simultaneously querying historical and real-time data was not possible.

Enter AWS Athena, a serverless ad hoc query engine that allows federated queries using standard SQL syntax across massive datasets in S3, eliminating the need to process all data with a warehousing solution. For organizations leveraging analytics to make real-time decisions, Athena became a game changer as ad-hoc queries in S3 and other locations in ‘as is’ state was quick and cost-effective.

Customers have reaped rich dividends using Athena—a simple, easy to use yet great value for money. Noventiq has extensive experience deploying Athena and some of our customers have achieved cost savings to the tune of 40% by leveraging AWS Athena’s pay-per-query pricing, high performance and reliability at scale. The convenience, speed-to-insight and ease-of-use are factors that further reduce cost.

Specifically, our customers have achieved the following benefits.
Cost savings: Athena costs USD 5 per TB of data scanned—a model which makes it cheaper than other ETL tools especially since it can query compressed files. Customers tune and optimize the logic to harvest further cost benefits and experiment new ways to split and run ETL jobs.

Scale: Customers process massive amounts of unstructured data at scale without worrying about optimizing or managing compute clusters allowing time and efforts to be directed towards more productive work.

Resilience: Athena is highly performant and scales seamlessly allowing businesses to focus on the job at hand without worrying about the performance of the query engine.

Ease of Use: There is no learning curve to use Athena as it returns queries with simple SQL language. This equips business users to run queries independently.

Data load/transport not required: Customers get a head-start in querying capabilities with Athena’s schema-on-read technology which enables to read data in S3 where queries are executed, eliminating the need to load or transfer data.

Some use cases of Athena include log analysis of CloudTrail for granular insights into AWS services usage; to track security and compliance issues; make real-time offers to online users by combining insights from historical data in S3 and accessing logs from other customer touch points. Athena offers a great way to validate new datasets with ad hoc queries to check if the data is logical or needs fixing.

A comprehensive cost-benefit analysis of implementing data analytics must include time, resources and cost of extracting that data. As data generation explodes, adaptive businesses must adopt new tools and methodologies that facilitate to scale quickly, easily and cost-effectively.

If you need more information about leveraging data analytics at scale using AWS Athena, reach out to us.