Capture ECS task events into Amazon CloudWatch using Amazon EventBridge
About
Amazon Elastic Container Service watches over your application 24/7, making autonomous decisions about how to keep your application up and running on your infrastructure. For example, if it sees that your application has crashed, then it will restart it. If an EC2 instance goes offline then Elastic Container Service can relaunch your application on a different EC2 instance that is still online.
By default, ECS only retains information on a task while it is running, and for a brief period of time after the task has stopped. What if you want to capture task history for longer, in order to review older tasks that crashed in the past?
With this pattern you can use Amazon EventBridge to capture ECS task data into long term storage in Amazon CloudWatch, then query that data back out later using CloudWatch Log Insights query language.
CloudWatch Container Insights
Amazon ECS CloudWatch Container Insights is an optional feature that you can enable to store and retain task telemetry data for as long as you want. The task telemetry data includes resource usage statistics, at one minute resolution, covering CPU, memory, networking, and storage.
TIP
There is no charge for using Amazon ECS, however the Container Insights feature does come with an additional cost based on the amount of data stored in CloudWatch, and an additional cost for querying that data using CloudWatch Log Insights. A task with one container generates about 1 MB of telemetry data per day. If there is more than one container per task, or you have frequent task turnover you may generate even more telemetry data. Queries will also cost more based on the amount of telemetry data processed by the query. See Amazon CloudWatch pricing for more info.
In order to activate Container Insights for a cluster, you can use the command line:
aws ecs update-cluster-settings \
--cluster cluster_name_or_arn \
--settings name=containerInsights,value=enabled \
--region us-east-1
Or you can enable Container Insights when creating an ECS cluster with CloudFormation:
MyCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterName: production
Configuration:
containerInsights: enabled
From this point on you will start to see new metrics and new logs stored in CloudWatch. You can find the raw task details over time stored in CloudWatch Logs, under the namespace /aws/ecs/containerinsights/<cluster-name>
. By default this log group only stores data for one day. However, you can edit the retention period to store this data for even longer, by finding the namespace in the CloudWatch Logs console, and editing it's settings.
Sample Container Insights Telemetry
Here are some sample telemetry events similar to what you will see in CloudWatch after enabling Container Insights:
- Container Telemetry Event
- Task Telemetry Event
{
"Version": "0",
"Type": "Container",
"ContainerName": "stress-ng",
"TaskId": "fd84326dd7a44ad48c74d2487f773e1e",
"TaskDefinitionFamily": "stress-ng",
"TaskDefinitionRevision": "2",
"ServiceName": "stress-ng",
"ClusterName": "benchmark-cluster-ECSCluster-TOl9tY939Z2a",
"Image": "209640446841.dkr.ecr.us-east-2.amazonaws.com/stress-ng:latest",
"ContainerKnownStatus": "RUNNING",
"Timestamp": 1654023960000,
"CpuUtilized": 24.915774739583338,
"CpuReserved": 256,
"MemoryUtilized": 270,
"MemoryReserved": 512,
"StorageReadBytes": 0,
"StorageWriteBytes": 0,
"NetworkRxBytes": 0,
"NetworkRxDropped": 0,
"NetworkRxErrors": 0,
"NetworkRxPackets": 4532,
"NetworkTxBytes": 0,
"NetworkTxDropped": 0,
"NetworkTxErrors": 0,
"NetworkTxPackets": 1899
}
Container Insights telemetry can be queried by using CloudWatch Log Insights. For example this is a sample query that grabs the telemetry for a specific task.
fields @timestamp, @message
| filter Type="Container" and TaskId="33a03820a2ce4ced85af7e0d4f51daf7"
| sort @timestamp desc
| limit 20
You can find more sample queries and query syntax rules in the CloudWatch Log Insights docs.
Capture ECS Task History
In addition to the raw telemetry, Amazon ECS produces events which can be captured in a CloudWatch log group using Amazon EventBridge. These events happen when a service is updated, a task changes state, or a container instance changes state. Here is how you can capture these events using Amazon EventBridge.
The following CloudFormation will setup an EventBridge rule that captures events for a task into CloudWatch Logs:
AWSTemplateFormatVersion: '2010-09-09'
Description: This template deploys an Amazon EventBridge rule that captures
Elastic Container Service task history for persistence in Amazon CloudWatch.
Parameters:
ServiceName:
Type: String
Description: The name of the ECS service that you would like to capture events from
ServiceArn:
Type: String
Description: The full ARN of the service that you would like to capture events from
Resources:
# A CloudWatch log group for persisting the Amazon ECS events
ServiceEventLog:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub /benchmark/${ServiceName}-events
# Create the EventBridge rule that captures deployment events into the CloudWatch log group
CaptureServiceDeploymentEvents:
Type: AWS::Events::Rule
Properties:
Description: !Sub 'Capture service deployment events from the ECS service ${ServiceName}'
# Which events to capture
EventPattern:
source:
- aws.ecs
detail-type:
- "ECS Deployment State Change"
- "ECS Service Action"
resources:
- !Ref ServiceArn
# Where to send the events
Targets:
- Arn: !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${ServiceEventLog}
Id: 'CloudWatchLogGroup'
# Create a log group resource policy that allows EventBridge to put logs into
# the log group
LogGroupForEventsPolicy:
Type: AWS::Logs::ResourcePolicy
Properties:
PolicyName: EventBridgeToCWLogsPolicy
PolicyDocument: !Sub
- >
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "EventBridgetoCWLogsPolicy",
"Effect": "Allow",
"Principal": {
"Service": [
"delivery.logs.amazonaws.com",
"events.amazonaws.com"
]
},
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"${LogArn}"
]
}
]
}
- { LogArn: !GetAtt ServiceEventLog.Arn }
The template requires input parameters:
ServiceName
- The name of an ECS Service you would like to start capturing events from. Example:sample-webapp
ServiceARN
- The full ARN (Amazon Resource Name) for the service. Example:arn:aws:ecs:us-west-2:123456789012:service/sample-webapp
You can deploy this template using the CloudFormation console, or the AWS CLI using a command like:
aws cloudformation deploy \
--template-file eventbridge-ecs-task-events.yml \
--stack-name eventbridge-ecs-task-events \
--capabilities CAPABILITY_IAM \
--parameter-overrides \
ServiceName=sample-webapp \
ServiceArn=arn:aws:ecs:us-west-2:123456789012:service/sample-webapp
Sample ECS Task Event
Once deployed, Amazon EventBridge will start capturing ECS events into Amazon CloudWatch. Each event will be a full point in time snapshot of the ECS task's state. The following JSON is an example of what the event will look like:
{
"version": "0",
"id": "b38a1269-debf-7ada-9576-f69ce2752526",
"detail-type": "ECS Task State Change",
"source": "aws.ecs",
"account": "209640446841",
"time": "2022-05-31T20:12:43Z",
"region": "us-east-2",
"resources": [
"arn:aws:ecs:us-east-2:209640446841:task/benchmark-cluster-ECSCluster-TOl9tY939Z2a/0c45c999f51741509482c5829cebb82e"
],
"detail": {
"attachments": [
{
"id": "4b01ba81-00ee-471d-99dc-9d215bff56e5",
"type": "eni",
"status": "DELETED",
"details": [
{
"name": "subnetId",
"value": "subnet-04f3a518011557633"
},
{
"name": "networkInterfaceId",
"value": "eni-0685196fd7cf97f27"
},
{
"name": "macAddress",
"value": "06:a8:e2:77:53:2c"
},
{
"name": "privateDnsName",
"value": "ip-10-0-121-242.us-east-2.compute.internal"
},
{
"name": "privateIPv4Address",
"value": "10.0.121.242"
}
]
}
],
"attributes": [
{
"name": "ecs.cpu-architecture",
"value": "x86_64"
}
],
"availabilityZone": "us-east-2b",
"capacityProviderName": "FARGATE",
"clusterArn": "arn:aws:ecs:us-east-2:209640446841:cluster/benchmark-cluster-ECSCluster-TOl9tY939Z2a",
"connectivity": "CONNECTED",
"connectivityAt": "2022-05-31T18:08:12.052Z",
"containers": [
{
"containerArn": "arn:aws:ecs:us-east-2:209640446841:container/benchmark-cluster-ECSCluster-TOl9tY939Z2a/0c45c999f51741509482c5829cebb82e/1471ad51-9c53-4d56-82d9-04b26f82369e",
"exitCode": 0,
"lastStatus": "STOPPED",
"name": "stress-ng",
"image": "209640446841.dkr.ecr.us-east-2.amazonaws.com/stress-ng:latest",
"imageDigest": "sha256:75c15a49ea93c3ac12c73a283cb72eb7e602d9b09fe584440bdf7d888e055288",
"runtimeId": "0c45c999f51741509482c5829cebb82e-2413177855",
"taskArn": "arn:aws:ecs:us-east-2:209640446841:task/benchmark-cluster-ECSCluster-TOl9tY939Z2a/0c45c999f51741509482c5829cebb82e",
"networkInterfaces": [
{
"attachmentId": "4b01ba81-00ee-471d-99dc-9d215bff56e5",
"privateIpv4Address": "10.0.121.242"
}
],
"cpu": "256",
"memory": "512"
}
],
"cpu": "256",
"createdAt": "2022-05-31T18:08:08.011Z",
"desiredStatus": "STOPPED",
"enableExecuteCommand": false,
"ephemeralStorage": {
"sizeInGiB": 20
},
"executionStoppedAt": "2022-05-31T20:12:20.683Z",
"group": "service:stress-ng",
"launchType": "FARGATE",
"lastStatus": "STOPPED",
"memory": "512",
"overrides": {
"containerOverrides": [
{
"name": "stress-ng"
}
]
},
"platformVersion": "1.4.0",
"pullStartedAt": "2022-05-31T18:08:22.205Z",
"pullStoppedAt": "2022-05-31T18:08:23.109Z",
"startedAt": "2022-05-31T18:08:23.817Z",
"startedBy": "ecs-svc/3941167241989127803",
"stoppingAt": "2022-05-31T20:12:06.844Z",
"stoppedAt": "2022-05-31T20:12:43.412Z",
"stoppedReason": "Scaling activity initiated by (deployment ecs-svc/3941167241989127803)",
"stopCode": "ServiceSchedulerInitiated",
"taskArn": "arn:aws:ecs:us-east-2:209640446841:task/benchmark-cluster-ECSCluster-TOl9tY939Z2a/0c45c999f51741509482c5829cebb82e",
"taskDefinitionArn": "arn:aws:ecs:us-east-2:209640446841:task-definition/stress-ng:2",
"updatedAt": "2022-05-31T20:12:43.412Z",
"version": 6
}
}
Similar to telemetry, these task events can be queried using Amazon CloudWatch Log Insights. The following sample query will fetch task state change history for a single task:
fields @timestamp, detail.attachments.0.status as ENI, detail.lastStatus as status, detail.desiredStatus as desiredStatus, detail.stopCode as stopCode, detail.stoppedReason as stoppedReason
| filter detail.taskArn = "<your task ARN>"
| sort @timestamp desc
| limit 20
Example output:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| @timestamp | ENI | status | desiredStatus | stopCode | stoppedReason |
|-------------------------|------------|----------------|---------------|---------------------------|------------------------------------------------------------------------|
| 2022-06-01 19:03:41.000 | DELETED | STOPPED | STOPPED | ServiceSchedulerInitiated | Scaling activity initiated by (deployment ecs-svc/8045142110272152487) |
| 2022-06-01 19:03:08.000 | ATTACHED | DEPROVISIONING | STOPPED | ServiceSchedulerInitiated | Scaling activity initiated by (deployment ecs-svc/8045142110272152487) |
| 2022-06-01 19:02:45.000 | ATTACHED | RUNNING | STOPPED | ServiceSchedulerInitiated | Scaling activity initiated by (deployment ecs-svc/8045142110272152487) |
| 2022-06-01 18:56:56.000 | ATTACHED | RUNNING | RUNNING | | |
| 2022-06-01 18:56:51.000 | ATTACHED | PENDING | RUNNING | | |
| 2022-06-01 18:56:29.000 | PRECREATED | PROVISIONING | RUNNING | | |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
With this abbreviated table you can see the history of state changes that an AWS Fargate task goes through as it normally starts up and then shuts down.
In the case of a task that unexpectedly stopped at some point in the past this history of events can be very useful for understanding just what happened to this task and why.
If you are interested in service level events, or container instance level events you can find samples of what those events look like in the Amazon ECS events documentation.
See Also
- Create a custom CloudWatch dashboard for your ECS service
- Effective use: Amazon ECS lifecycle events with Amazon CloudWatch logs insights - More sample queries for CloudWatch Log Insights