I have a Java application that runs in AWS Elastic Container Service. Application polls a queue periodically. Sometimes there is no response from the queue and the application hanging forever. I have enclosed the methods with try-catch blocks with logging exceptions. Even though there are no logs in the Cloudwatch after that. No exceptions or errors. Is there a way that I can identify this situation. ? (No logs in the Cloudwatch). Like filtering an error log pattern. So I can restart the service. Any trick or solution would be appreciated.
public void handleProcess() { try { while(true) { Response response = QueueUitils.pollQueue(); // poll the queue QueueUitils.processMessage(response); TimeUnit.SECONDS.sleep(WAIT_TIME); // WAIT_TIME = 20 } } catch (Exception e) { LOGGER.error("Data Queue operation failed" + e.getMessage()); throw e; } }
Advertisement
Answer
You can do this with CloudWatch Alarms. I’ve set up a test Lambda function for this which runs every minute and logs to CloudWatch.
- Go to CloudWatch and Click Alarms in the left hand side menu
- Click the orange Create Alarm button
- Click Select Metric
- Then choose Logs, then Log Group Metrics and choose the
IncomingLogEvents
metric for the relevant log group (the log group to which your application is logging). In my case it’s/aws/lambda/test-log-silence
- Click Select Metric
- Now you can specify how you want to measure the metric. I’ve chosen the average log entries over 5 minutes, so after 5 minutes if there are no log entries, that value would be zero.
- Scroll down, and you set the check to be “Lower Than or Equal To” zero. This will trigger the alarm when there are no log entries for 5 minutes (or whatever you decide to set it to).
- Now click next, and you can specify an SNS topic to push the notification to. You can set up an SNS topic to notify you via email, SMS, AWS Lambda, and others.