How to Monitor Serverless Environment

September 8, 2019 / Nishant Sharma

Serverless is changing the computing paradigm ushering in new connotations for faster and better. It
abstracts the application layer from the infrastructure by removing the host from the equation and reducing operational responsibilities.
Due to its many benefits serverless has become popular in recent times. According to a New Stack
survey in 2018, nearly 78% of respondents were using or planning to use serverless in the next 18
months.

Serverless is Different

Building serverless applications enables to shift operational responsibilities to the Cloud provider by leaving the plumbing including provisioning, scaling and managing servers to the service provider. It enables the developer to focus on developing code without worrying about infrastructure operations. Serverless increases productivity by bringing applications to life in a short span of time and ensures high performance with built-in scalability and reduced infrastructure cost with elasticity.

Serverless offerings such as AWS Lambda and DynamoDB do not require any infrastructure set-up, capacity planning, server management and network optimization as this is managed by AWS.

How Monitoring Serverless is Different

In the Cloud we monitor the performance of servers in terms of memory, capacity; network latency, etc. But in serverless these metrics are irrelevant as that layer is abstracted and things like scalability and load balancing is not an issue. Therefore monitoring serverless requires a different set of metrics. This includes

Errors: invocation errors, throttled requests
Latency: execution time for a request
Traffic: number of requests that the resource is handling

When combined, these metrics provide a comprehensive picture of operations to detect and prevent errors and performance degradation. Below we take AWS Lambda as an example to discuss how serverless can be monitored.

Monitoring AWS Lambda

The most important thing to monitor is the application code which calls for high level of observability while it is in production as workflows can be spread across several services and APIs with millions of concurrent operations.
Lambda enables to upload the code as a function and takes care of the execution by making available the compute capacity and scaling requirements. An error may occur due to the following conditions:

If there is some problem in the application code itself
If the function exceeds the number of concurrent executions as Lambda has limits on the amount of memory and concurrent executions
If Lambda requires to access other resources in AWS to complete the function and permission for access not configured in IAM.

Tools for Monitoring AWS Lambda

AWS Lambda can be monitored using Amazon CloudWatch and AWS X-Ray services.

AWS CloudWatch enables to monitor Lambda with metrics such as number of invocations, min/average/max of invocation and number of throttled requests, etc. While CloudWatch records metrics at one-minute intervals, custom metrics via API allow to record these metrics at 1 sec intervals.
AWS X-Ray performs scanning for application performance on Lambda to track the progress of a request across various AWS services. X-Ray can capture the progress of a function across the loop, such as, a Lambda trigger by API gateway which may require to access data from an S3 bucket and display it visually to facilitate root cause analysis.

In addition third-party tools such as Datadog can be integrated with Lambda to get performance metrics on a single dashboard and perform additional functions such as custom metrics to get specific insights into operations relevant to the business workflow; verify permissions, etc.

While serverless applications have great promise there must be mechanisms that enable monitoring and observability to achieve the benefits on a consistent basis. It is crucial to remember that benefits of the new computing paradigm in serverless must be accompanied by new methods of monitoring.