How to use CloudWatch to generate alerts from logs?

At the latest AWS NYC Submit, Amazon announced "CloudWatch Logs", a log storage and monitoring feature that enables AWS customers to monitor and troubleshoot systems and applications using system, application and custom log files. CloudWatch Logs currently lacks some of the essential log management capabilities like search and sophisticated visualizations, nonetheless it is a major leap in functionality for CloudWatch.
CloudWatch Logs enable AWS customers to easily move logs off of individual EC2 instances into a central repository, and browse the logs via the web UI. But the most appealing feature of CloudWatch Logs is arguably the ability to monitor the logs for specific phrases, values or patterns, and generate alarms from them. CloudWatch Logs support variety of use cases:
Generate an alarm when a keyword shows up in the logs
You can generate alarms in near real-time every time specific keywords are found in the logs. This type of alarms are useful when you need to find out the moment something happens. For example, an alarm can be generated when an application throws an exception or a critical transaction fails.
Generate an alarm when something happens multiple times within specified time frame
This type of alarms are useful to detect conditions where a single log entry may be normal but multiple entries within a short time may indicate a problem. A typical example is failed login attempts. Many consecutive failed login attempts for a user within a short time may indicate a brute force attack to gain access to the system.
Extract metrics from logs
You can use filters to extract values from space delimited logs, and create alarms based on these values. For example, if metrics like number of connections or response times are logged by an application, you can extract this data from the logs and generate an alarm if the value is above a threshold, etc. CloudWatch already had support for generating alarms from custom metrics, however ability to parse the logs means you can extract metrics from any application including 3rd party ones.
Let's go through the configuration where we will generate alerts every time "Exception" keyword shows up in a particular log file.
Prerequisites
- Amazon AWS Account
- In this example, we will use OpsGenie CloudWatch Integration API as an SNS HTTP/S endpoints to forward CloudWatch alarms to OpsGenie. If you don’t already have one, Create a free OpsGenie account. and follow the instructions to add CloudWatch integration. Alternatively you can forward alarms to a service like http://requestb.in to see the content of the alarms.
- An EC2 instance in US East (CloudWatch logs feature is only available in US East Region at the moment)
Install CloudWatch Log
Create the AMI user with the below policy and save the credentials. For detailed instructions refer to this documentation
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "logs:*" ], "Resource": [ "arn:aws:logs:us-east-1:*:*" ] } ] }
Connect to your EC2 instance and run the following commands to download, install and configure Amazon CloudWatch Agent
wget https://s3.amazonaws.com/aws-cloudwatch/downloads/awslogs-agent-setup-v1.0.py sudo python ./awslogs-agent-setup-v1.0.py --region us-east-1
- AWSLogs Agent Setup script need some information about the system. Before running this script you need to know which logs file monitor and its timestamp format.
I want to monitor /var/log/myapp/request.log. I fill the information like below. If you want to monitor more files enter "Y".
- Open the Amazon CloudWatch console at https://console.aws.amazon.com/cloudwatch/ . You must select the US East (N. Virginia) region.
In the navigation pane, click Logs.After a while you see the Logs.Log groups are listed with name ( if file is empty, it isn't listed. Execute "echo "Aug 11 13:10:00 hello world" >> /var/log/myapp/request.log" command to add a log line. ).
Click the /var/log/myapp/request.log. Log streams are listed with ec2 names as i defined in setup.
Click the stream name and see the log data. Now we've successfully sent logs to CloudWatch.
- To define metrics for the log group, return to the "Log Groups", select "/var/log/myapp/request.log", and click on "Define Metric Filter".
Create a metric filter for the "Exception" keyword. You can click "Test Pattern" to test the filter pattern with existing data in the logs ( Patterns are case sensitive and non alpha numeric characters must be put in double quotes.
Give a name for the metric filter and click "Create Filter" button.
You should now see the newly created filter. Click on the "Create Alarm" button.
Use the SNS you have defined for OpsGenie integration as the notification target for the action (Send notification to field)
CloudWatch is now configured to generate an alarm and forward it to OpsGenie via HTTPS whenever there is a log entry with Exception in it. For testing, add a log line with Exception into the log file
echo "Aug 11 13:15:00 Some Exception: something is really wrong" >> /var/log/myapp/request.log #learn current date with date command and i use it
After a while, you should see an alert in OpsGenie.
When I click the alert, i see the alert details.
CloudWatch provides a descent solution to store logs in a central repository, monitor the logs in near real-time, and you forward the alarms to OpsGenie, leverage OpsGenie's on-call schedules and escalation policies to determine the right people to notify, and notify them using iPhone & Android push notifications, text messages (SMS), and phone calls until the alert is acknowledged or closed. Using the integrated CloudWatch and OpsGenie solution, you can increase the reliability of your system and improve the customer experience.
Please feel free to contact us with any questions and share your use cases.