Amazon ECS Capacity Provider for EC2 instances

Nathan Peck profile picture
Nathan Peck
Senior Developer Advocate at AWS

Terminology and Background

Amazon Elastic Container Service (ECS) is container orchestrator that deploy containerized applications to both Amazon EC2 capacity as well as serverless AWS Fargate capacity.

Amazon ECS capacity providers are a built-in feature that helps you launch EC2 capacity on fly. When application containers need to run, the capacity provider provisions as many EC2 hosts as necessary. When all containers are done running, the cluster can "scale to zero" by shutting down all EC2 hosts.

This pattern shows a production ready ECS on EC2 capacity provider configuration. It comes with a variety of helpful, out of the box configurations and failsafes to keep your ECS on EC2 cluster resilient.

Architecture Diagrams

The following diagrams show what this pattern will deploy:

VPCAvailability Zone 1Auto Scaling groupEC2 instanceAvailability Zone 2EC2 instanceAvailability Zone 3EC2 instanceHealth DaemonService ContainerHealth DaemonHealth DaemonEmpty SpaceEmpty SpaceEmpty SpaceService ContainerService ContainerService ContainerService ContainerService Container

By following the instructions here, you will deploy:

  1. A group of EC2 instances launched by an EC2 Auto Scaling Group, spread across availability zones
  2. Each EC2 instance hosts a lightweight (<10 MB memory) daemon task used for health verification
  3. Each EC2 instance can host multiple application containers. This allows you to save on infrastructure costs and acheive better utilization of your EC2 instances, by running more instances of your application per host instance.

The runtime aspects of the architecture are orchestrated by Amazon Elastic Container Service in the following manner:

Auto Scaling groupEC2 instanceEC2 instanceEC2 instanceHealth DaemonService ContainerHealth DaemonHealth DaemonEmpty SpaceEmpty SpaceEmpty SpaceService ContainerService ContainerService ContainerService ContainerService ContainerAmazon Elastic Container Service (Amazon ECS)Health DaemonCapacity provider manages auto scaling group sizeOne copy of health daemon placed per EC2 instanceSchedule application containers onto available EC2 capacity

  • Amazon ECS manages the size of the Auto Scaling Group, and automatically scales it to the appropriate size to match the number of application containers you want to run
  • A DAEMON type service is used to automatically launch one copy of the health verification task onto each instance when it joins the ECS cluster.
  • A REPLICA type service is used to decide how many EC2 instances to scale up to. The service's application container is distributed across the instances.

This architecture also comes with operational enhancements designed to make it easier and safer to manage the EC2 instances that are used as container capacity:

AWS CloudFormationEach EC2 instance sends a health signals to the AWS::EC2::AutoScalingGroupresource whenever it sees the health daemon successfully start upAuto Scaling groupEC2 instanceEC2 instanceEC2 instanceHealth DaemonService ContainerHealth DaemonHealth DaemonEmpty SpaceEmpty SpaceEmpty SpaceService ContainerService ContainerService ContainerService ContainerService ContainerCloudFormation initiates a rolling update to the Auto Scaling GroupECS Optimized Amazon Machine Image deploymentviaAWS CloudFormation stackupdate

  • The CloudFormation template uses a dynamic SSM parameter to determine what ECS Optimized AMI to deploy. This parameter ensures that each time you deploy the stack it will check to see if there is available update that needs to be applied to the EC2 instances.
  • The Auto Scaling Group is configured to monitor CloudFormation signals when applying a rolling AMI update to the EC2 instances.
  • Each EC2 instance runs a CloudFormation initialization script that verifies that the host is actually able to connect to the ECS control plane and launch a health daemon task. Only once the EC2 instance is successfully registered with ECS, and has launched the health check task, then the CloudFormation signal is used to notify the Auto Scaling Group that the host is healthy.
  • In the event that an configuration or AMI update does not function, this configuration will automatically rollback the stack to the previous EC2 configuration. This gives you a safe way to continuously roll out updates to the ECS AMI on a regular basis.

Dependencies

This pattern uses AWS SAM CLI for deploying CloudFormation stacks on your account. You should follow the appropriate steps for installing SAM CLI.

Cluster with EC2 Capacity Provider

Download the following cluster-capacity-provider.yml file, which deploys an ECS cluster that has a capacity provider linked to an EC2 Autoscaling Group. The Autoscaling Group starts out scaled to zero, empty of EC2 instances.

File: cluster-capacity-provider.ymlLanguage: yml
AWSTemplateFormatVersion: '2010-09-09'
Description: EC2 ECS cluster that starts out empty, with no EC2 instances yet.
             An ECS capacity provider automatically launches more EC2 instances
             as required on the fly when you request ECS to launch services or
             standalone tasks.
Parameters:
  InstanceType:
    Type: String
    Default: c5.xlarge
    Description: Class of EC2 instance used to host containers. Choose t2 for testing, m5 for general purpose, c5 for CPU intensive services, and r5 for memory intensive services
    AllowedValues: ["a1.2xlarge", "a1.4xlarge", "a1.large", "a1.medium", "a1.metal", "a1.xlarge", "c1.medium", "c1.xlarge", "c3.2xlarge", "c3.4xlarge", "c3.8xlarge", "c3.large", "c3.xlarge", "c4.2xlarge", "c4.4xlarge", "c4.8xlarge", "c4.large", "c4.xlarge", "c5.12xlarge", "c5.18xlarge", "c5.24xlarge", "c5.2xlarge", "c5.4xlarge", "c5.9xlarge", "c5.large", "c5.metal", "c5.xlarge", "c5a.12xlarge", "c5a.16xlarge", "c5a.24xlarge", "c5a.2xlarge", "c5a.4xlarge", "c5a.8xlarge", "c5a.large", "c5a.xlarge", "c5ad.12xlarge", "c5ad.16xlarge", "c5ad.24xlarge", "c5ad.2xlarge", "c5ad.4xlarge", "c5ad.8xlarge", "c5ad.large", "c5ad.xlarge", "c5d.12xlarge", "c5d.18xlarge", "c5d.24xlarge", "c5d.2xlarge", "c5d.4xlarge", "c5d.9xlarge", "c5d.large", "c5d.metal", "c5d.xlarge", "c5n.18xlarge", "c5n.2xlarge", "c5n.4xlarge", "c5n.9xlarge", "c5n.large", "c5n.metal", "c5n.xlarge", "c6a.12xlarge", "c6a.16xlarge", "c6a.24xlarge", "c6a.2xlarge", "c6a.32xlarge", "c6a.48xlarge", "c6a.4xlarge", "c6a.8xlarge", "c6a.large", "c6a.metal", "c6a.xlarge", "c6g.12xlarge", "c6g.16xlarge", "c6g.2xlarge", "c6g.4xlarge", "c6g.8xlarge", "c6g.large", "c6g.medium", "c6g.metal", "c6g.xlarge", "c6gd.12xlarge", "c6gd.16xlarge", "c6gd.2xlarge", "c6gd.4xlarge", "c6gd.8xlarge", "c6gd.large", "c6gd.medium", "c6gd.metal", "c6gd.xlarge", "c6gn.12xlarge", "c6gn.16xlarge", "c6gn.2xlarge", "c6gn.4xlarge", "c6gn.8xlarge", "c6gn.large", "c6gn.medium", "c6gn.xlarge", "c6i.12xlarge", "c6i.16xlarge", "c6i.24xlarge", "c6i.2xlarge", "c6i.32xlarge", "c6i.4xlarge", "c6i.8xlarge", "c6i.large", "c6i.metal", "c6i.xlarge", "c6id.12xlarge", "c6id.16xlarge", "c6id.24xlarge", "c6id.2xlarge", "c6id.32xlarge", "c6id.4xlarge", "c6id.8xlarge", "c6id.large", "c6id.metal", "c6id.xlarge", "c6in.12xlarge", "c6in.16xlarge", "c6in.24xlarge", "c6in.2xlarge", "c6in.32xlarge", "c6in.4xlarge", "c6in.8xlarge", "c6in.large", "c6in.metal", "c6in.xlarge", "c7g.12xlarge", "c7g.16xlarge", "c7g.2xlarge", "c7g.4xlarge", "c7g.8xlarge", "c7g.large", "c7g.medium", "c7g.metal", "c7g.xlarge", "c7gd.12xlarge", "c7gd.16xlarge", "c7gd.2xlarge", "c7gd.4xlarge", "c7gd.8xlarge", "c7gd.large", "c7gd.medium", "c7gd.xlarge", "c7gn.12xlarge", "c7gn.16xlarge", "c7gn.2xlarge", "c7gn.4xlarge", "c7gn.8xlarge", "c7gn.large", "c7gn.medium", "c7gn.xlarge", "cc2.8xlarge", "cr1.8xlarge", "d2.2xlarge", "d2.4xlarge", "d2.8xlarge", "d2.xlarge", "d3.2xlarge", "d3.4xlarge", "d3.8xlarge", "d3.xlarge", "d3en.12xlarge", "d3en.2xlarge", "d3en.4xlarge", "d3en.6xlarge", "d3en.8xlarge", "d3en.xlarge", "dl1.24xlarge", "f1.16xlarge", "f1.2xlarge", "f1.4xlarge", "g2.2xlarge", "g2.8xlarge", "g3.16xlarge", "g3.4xlarge", "g3.8xlarge", "g3s.xlarge", "g4ad.16xlarge", "g4ad.2xlarge", "g4ad.4xlarge", "g4ad.8xlarge", "g4ad.xlarge", "g4dn.12xlarge", "g4dn.16xlarge", "g4dn.2xlarge", "g4dn.4xlarge", "g4dn.8xlarge", "g4dn.metal", "g4dn.xlarge", "g5.12xlarge", "g5.16xlarge", "g5.24xlarge", "g5.2xlarge", "g5.48xlarge", "g5.4xlarge", "g5.8xlarge", "g5.xlarge", "g5g.16xlarge", "g5g.2xlarge", "g5g.4xlarge", "g5g.8xlarge", "g5g.metal", "g5g.xlarge", "h1.16xlarge", "h1.2xlarge", "h1.4xlarge", "h1.8xlarge", "hpc7g.16xlarge", "hpc7g.4xlarge", "hpc7g.8xlarge", "hs1.8xlarge", "i2.2xlarge", "i2.4xlarge", "i2.8xlarge", "i2.large", "i2.xlarge", "i3.16xlarge", "i3.2xlarge", "i3.4xlarge", "i3.8xlarge", "i3.large", "i3.metal", "i3.xlarge", "i3en.12xlarge", "i3en.24xlarge", "i3en.2xlarge", "i3en.3xlarge", "i3en.6xlarge", "i3en.large", "i3en.metal", "i3en.xlarge", "i4g.16xlarge", "i4g.2xlarge", "i4g.4xlarge", "i4g.8xlarge", "i4g.large", "i4g.xlarge", "i4i.16xlarge", "i4i.2xlarge", "i4i.32xlarge", "i4i.4xlarge", "i4i.8xlarge", "i4i.large", "i4i.metal", "i4i.xlarge", "im4gn.16xlarge", "im4gn.2xlarge", "im4gn.4xlarge", "im4gn.8xlarge", "im4gn.large", "im4gn.xlarge", "inf1.24xlarge", "inf1.2xlarge", "inf1.6xlarge", "inf1.xlarge", "inf2.24xlarge", "inf2.48xlarge", "inf2.8xlarge", "inf2.xlarge", "is4gen.2xlarge", "is4gen.4xlarge", "is4gen.8xlarge", "is4gen.large", "is4gen.medium", "is4gen.xlarge", "m1.large", "m1.medium", "m1.small", "m1.xlarge", "m2.2xlarge", "m2.4xlarge", "m2.xlarge", "m3.2xlarge", "m3.large", "m3.medium", "m3.xlarge", "m4.10xlarge", "m4.16xlarge", "m4.2xlarge", "m4.4xlarge", "m4.large", "m4.xlarge", "m5.12xlarge", "m5.16xlarge", "m5.24xlarge", "m5.2xlarge", "m5.4xlarge", "m5.8xlarge", "m5.large", "m5.metal", "m5.xlarge", "m5a.12xlarge", "m5a.16xlarge", "m5a.24xlarge", "m5a.2xlarge", "m5a.4xlarge", "m5a.8xlarge", "m5a.large", "m5a.xlarge", "m5ad.12xlarge", "m5ad.16xlarge", "m5ad.24xlarge", "m5ad.2xlarge", "m5ad.4xlarge", "m5ad.8xlarge", "m5ad.large", "m5ad.xlarge", "m5d.12xlarge", "m5d.16xlarge", "m5d.24xlarge", "m5d.2xlarge", "m5d.4xlarge", "m5d.8xlarge", "m5d.large", "m5d.metal", "m5d.xlarge", "m5dn.12xlarge", "m5dn.16xlarge", "m5dn.24xlarge", "m5dn.2xlarge", "m5dn.4xlarge", "m5dn.8xlarge", "m5dn.large", "m5dn.metal", "m5dn.xlarge", "m5n.12xlarge", "m5n.16xlarge", "m5n.24xlarge", "m5n.2xlarge", "m5n.4xlarge", "m5n.8xlarge", "m5n.large", "m5n.metal", "m5n.xlarge", "m5zn.12xlarge", "m5zn.2xlarge", "m5zn.3xlarge", "m5zn.6xlarge", "m5zn.large", "m5zn.metal", "m5zn.xlarge", "m6a.12xlarge", "m6a.16xlarge", "m6a.24xlarge", "m6a.2xlarge", "m6a.32xlarge", "m6a.48xlarge", "m6a.4xlarge", "m6a.8xlarge", "m6a.large", "m6a.metal", "m6a.xlarge", "m6g.12xlarge", "m6g.16xlarge", "m6g.2xlarge", "m6g.4xlarge", "m6g.8xlarge", "m6g.large", "m6g.medium", "m6g.metal", "m6g.xlarge", "m6gd.12xlarge", "m6gd.16xlarge", "m6gd.2xlarge", "m6gd.4xlarge", "m6gd.8xlarge", "m6gd.large", "m6gd.medium", "m6gd.metal", "m6gd.xlarge", "m6i.12xlarge", "m6i.16xlarge", "m6i.24xlarge", "m6i.2xlarge", "m6i.32xlarge", "m6i.4xlarge", "m6i.8xlarge", "m6i.large", "m6i.metal", "m6i.xlarge", "m6id.12xlarge", "m6id.16xlarge", "m6id.24xlarge", "m6id.2xlarge", "m6id.32xlarge", "m6id.4xlarge", "m6id.8xlarge", "m6id.large", "m6id.metal", "m6id.xlarge", "m6idn.12xlarge", "m6idn.16xlarge", "m6idn.24xlarge", "m6idn.2xlarge", "m6idn.32xlarge", "m6idn.4xlarge", "m6idn.8xlarge", "m6idn.large", "m6idn.metal", "m6idn.xlarge", "m6in.12xlarge", "m6in.16xlarge", "m6in.24xlarge", "m6in.2xlarge", "m6in.32xlarge", "m6in.4xlarge", "m6in.8xlarge", "m6in.large", "m6in.metal", "m6in.xlarge", "m7a.12xlarge", "m7a.16xlarge", "m7a.24xlarge", "m7a.2xlarge", "m7a.32xlarge", "m7a.48xlarge", "m7a.4xlarge", "m7a.8xlarge", "m7a.large", "m7a.medium", "m7a.metal-48xl", "m7a.xlarge", "m7g.12xlarge", "m7g.16xlarge", "m7g.2xlarge", "m7g.4xlarge", "m7g.8xlarge", "m7g.large", "m7g.medium", "m7g.metal", "m7g.xlarge", "m7gd.12xlarge", "m7gd.16xlarge", "m7gd.2xlarge", "m7gd.4xlarge", "m7gd.8xlarge", "m7gd.large", "m7gd.medium", "m7gd.xlarge", "m7i-flex.2xlarge", "m7i-flex.4xlarge", "m7i-flex.8xlarge", "m7i-flex.large", "m7i-flex.xlarge", "m7i.12xlarge", "m7i.16xlarge", "m7i.24xlarge", "m7i.2xlarge", "m7i.48xlarge", "m7i.4xlarge", "m7i.8xlarge", "m7i.large", "m7i.xlarge", "mac1.metal", "mac2.metal", "p2.16xlarge", "p2.8xlarge", "p2.xlarge", "p3.16xlarge", "p3.2xlarge", "p3.8xlarge", "p3dn.24xlarge", "p4d.24xlarge", "p4de.24xlarge", "p5.48xlarge", "r3.2xlarge", "r3.4xlarge", "r3.8xlarge", "r3.large", "r3.xlarge", "r4.16xlarge", "r4.2xlarge", "r4.4xlarge", "r4.8xlarge", "r4.large", "r4.xlarge", "r5.12xlarge", "r5.16xlarge", "r5.24xlarge", "r5.2xlarge", "r5.4xlarge", "r5.8xlarge", "r5.large", "r5.metal", "r5.xlarge", "r5a.12xlarge", "r5a.16xlarge", "r5a.24xlarge", "r5a.2xlarge", "r5a.4xlarge", "r5a.8xlarge", "r5a.large", "r5a.xlarge", "r5ad.12xlarge", "r5ad.16xlarge", "r5ad.24xlarge", "r5ad.2xlarge", "r5ad.4xlarge", "r5ad.8xlarge", "r5ad.large", "r5ad.xlarge", "r5b.12xlarge", "r5b.16xlarge", "r5b.24xlarge", "r5b.2xlarge", "r5b.4xlarge", "r5b.8xlarge", "r5b.large", "r5b.metal", "r5b.xlarge", "r5d.12xlarge", "r5d.16xlarge", "r5d.24xlarge", "r5d.2xlarge", "r5d.4xlarge", "r5d.8xlarge", "r5d.large", "r5d.metal", "r5d.xlarge", "r5dn.12xlarge", "r5dn.16xlarge", "r5dn.24xlarge", "r5dn.2xlarge", "r5dn.4xlarge", "r5dn.8xlarge", "r5dn.large", "r5dn.metal", "r5dn.xlarge", "r5n.12xlarge", "r5n.16xlarge", "r5n.24xlarge", "r5n.2xlarge", "r5n.4xlarge", "r5n.8xlarge", "r5n.large", "r5n.metal", "r5n.xlarge", "r6a.12xlarge", "r6a.16xlarge", "r6a.24xlarge", "r6a.2xlarge", "r6a.32xlarge", "r6a.48xlarge", "r6a.4xlarge", "r6a.8xlarge", "r6a.large", "r6a.metal", "r6a.xlarge", "r6g.12xlarge", "r6g.16xlarge", "r6g.2xlarge", "r6g.4xlarge", "r6g.8xlarge", "r6g.large", "r6g.medium", "r6g.metal", "r6g.xlarge", "r6gd.12xlarge", "r6gd.16xlarge", "r6gd.2xlarge", "r6gd.4xlarge", "r6gd.8xlarge", "r6gd.large", "r6gd.medium", "r6gd.metal", "r6gd.xlarge", "r6i.12xlarge", "r6i.16xlarge", "r6i.24xlarge", "r6i.2xlarge", "r6i.32xlarge", "r6i.4xlarge", "r6i.8xlarge", "r6i.large", "r6i.metal", "r6i.xlarge", "r6id.12xlarge", "r6id.16xlarge", "r6id.24xlarge", "r6id.2xlarge", "r6id.32xlarge", "r6id.4xlarge", "r6id.8xlarge", "r6id.large", "r6id.metal", "r6id.xlarge", "r6idn.12xlarge", "r6idn.16xlarge", "r6idn.24xlarge", "r6idn.2xlarge", "r6idn.32xlarge", "r6idn.4xlarge", "r6idn.8xlarge", "r6idn.large", "r6idn.metal", "r6idn.xlarge", "r6in.12xlarge", "r6in.16xlarge", "r6in.24xlarge", "r6in.2xlarge", "r6in.32xlarge", "r6in.4xlarge", "r6in.8xlarge", "r6in.large", "r6in.metal", "r6in.xlarge", "r7g.12xlarge", "r7g.16xlarge", "r7g.2xlarge", "r7g.4xlarge", "r7g.8xlarge", "r7g.large", "r7g.medium", "r7g.metal", "r7g.xlarge", "r7gd.12xlarge", "r7gd.16xlarge", "r7gd.2xlarge", "r7gd.4xlarge", "r7gd.8xlarge", "r7gd.large", "r7gd.medium", "r7gd.xlarge", "r7iz.12xlarge", "r7iz.16xlarge", "r7iz.2xlarge", "r7iz.32xlarge", "r7iz.4xlarge", "r7iz.8xlarge", "r7iz.large", "r7iz.xlarge", "t1.micro", "t2.2xlarge", "t2.large", "t2.medium", "t2.micro", "t2.nano", "t2.small", "t2.xlarge", "t3.2xlarge", "t3.large", "t3.medium", "t3.micro", "t3.nano", "t3.small", "t3.xlarge", "t3a.2xlarge", "t3a.large", "t3a.medium", "t3a.micro", "t3a.nano", "t3a.small", "t3a.xlarge", "t4g.2xlarge", "t4g.large", "t4g.medium", "t4g.micro", "t4g.nano", "t4g.small", "t4g.xlarge", "trn1.2xlarge", "trn1.32xlarge", "trn1n.32xlarge", "u-12tb1.112xlarge", "u-18tb1.112xlarge", "u-24tb1.112xlarge", "u-3tb1.56xlarge", "u-6tb1.112xlarge", "u-6tb1.56xlarge", "u-9tb1.112xlarge", "vt1.24xlarge", "vt1.3xlarge", "vt1.6xlarge", "x1.16xlarge", "x1.32xlarge", "x1e.16xlarge", "x1e.2xlarge", "x1e.32xlarge", "x1e.4xlarge", "x1e.8xlarge", "x1e.xlarge", "x2gd.12xlarge", "x2gd.16xlarge", "x2gd.2xlarge", "x2gd.4xlarge", "x2gd.8xlarge", "x2gd.large", "x2gd.medium", "x2gd.metal", "x2gd.xlarge", "x2idn.16xlarge", "x2idn.24xlarge", "x2idn.32xlarge", "x2idn.metal", "x2iedn.16xlarge", "x2iedn.24xlarge", "x2iedn.2xlarge", "x2iedn.32xlarge", "x2iedn.4xlarge", "x2iedn.8xlarge", "x2iedn.metal", "x2iedn.xlarge", "x2iezn.12xlarge", "x2iezn.2xlarge", "x2iezn.4xlarge", "x2iezn.6xlarge", "x2iezn.8xlarge", "x2iezn.metal", "z1d.12xlarge", "z1d.2xlarge", "z1d.3xlarge", "z1d.6xlarge", "z1d.large", "z1d.metal", "z1d.xlarge"]
    ConstraintDescription: Please choose a valid instance type.
  MaxSize:
    Type: Number
    Default: '100'
    Description: Maximum number of EC2 instances that can be launched in your ECS cluster.
  ECSAMI:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ecs/optimized-ami/amazon-linux-2/recommended/image_id
    Description: The Amazon Machine Image ID used for the cluster, leave it as the default value to get the latest AMI
  VpcId:
    Type: AWS::EC2::VPC::Id
    Description: VPC ID where the ECS cluster is launched
  SubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of subnet IDs where the EC2 instances will be launched

Resources:
  # Cluster that keeps track of container deployments
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterSettings:
        - Name: containerInsights
          Value: enabled

  # Custom resource that force destroys the ASG. This cleans up EC2 instances that had
  # managed termination protection enabled, but which are not yet released.
  # This is necessary because ECS does not immediately release an EC2 instance from termination
  # protection as soon as the instance is no longer running tasks. There is a cooldown delay.
  # In the case of tearing down the CloudFormation stack, CloudFormation will delete the
  # AWS::ECS::Service and immediately move on to tearing down the AWS::ECS::Cluster, disconnecting
  # the AWS::AutoScaling::AutoScalingGroup from ECS management too fast, before ECS has a chance
  # to asynchronously turn off managed instance protection on the EC2 instances.
  # This will leave some EC2 instances stranded in a state where they are protected from scale-in forever.
  # This then blocks the AWS::AutoScaling::AutoScalingGroup from cleaning itself up.
  # The custom resource function force destroys the autoscaling group when tearing down the stack,
  # avoiding the issue of protected EC2 instances that can never be cleaned up.
  CustomAsgDestroyerFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          const { AutoScalingClient, DeleteAutoScalingGroupCommand } = require("@aws-sdk/client-auto-scaling");
          const response = require('cfn-response');

          exports.handler = async function(event, context) {
            console.log(event);

            if (event.RequestType !== "Delete") {
              await response.send(event, context, response.SUCCESS);
              return;
            }

            const autoscaling = new AutoScalingClient({ region: event.ResourceProperties.Region });

            const input = {
              AutoScalingGroupName: event.ResourceProperties.AutoScalingGroupName,
              ForceDelete: true
            };
            const command = new DeleteAutoScalingGroupCommand(input);
            const deleteResponse = await autoscaling.send(command);
            console.log(deleteResponse);

            await response.send(event, context, response.SUCCESS);
          };
      Handler: index.handler
      Runtime: nodejs20.x
      Timeout: 30
      Role: !GetAtt CustomAsgDestroyerRole.Arn

  # The role used by the ASG destroyer
  CustomAsgDestroyerRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        # https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicExecutionRole.html
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: allow-to-delete-autoscaling-group
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: autoscaling:DeleteAutoScalingGroup
                Resource: !Sub arn:aws:autoscaling:${AWS::Region}:${AWS::AccountId}:autoScalingGroup:*:autoScalingGroupName/${ECSAutoScalingGroup}

  CustomAsgDestroyer:
    Type: Custom::AsgDestroyer
    DependsOn:
      - EC2Role
    Properties:
      ServiceToken: !GetAtt CustomAsgDestroyerFunction.Arn
      Region: !Ref "AWS::Region"
      AutoScalingGroupName: !Ref ECSAutoScalingGroup

  # Turn on ENI trunking for the EC2 instances. This setting is not on by default,
  # but it is highly important for increasing the density of AWS VPC networking mode
  # tasks per instance. Additionally, it is not controllable by default in CloudFormation
  # because it has some complexity of needing to be turned on by a bearer of the role
  # of the EC2 instances themselves. With this custom function we can assume the EC2 role
  # then use that role to call the ecs:PutAccountSetting API in order to enable
  # ENI trunking
  CustomEniTrunkingFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: |
          const { ECSClient, PutAccountSettingCommand } = require("@aws-sdk/client-ecs");
          const { STSClient, AssumeRoleCommand } = require("@aws-sdk/client-sts");

          const response = require('cfn-response');

          exports.handler = async function(event, context) {
            console.log(event);

            if (event.RequestType == "Delete") {
              await response.send(event, context, response.SUCCESS);
              return;
            }

            const sts = new STSClient({ region: event.ResourceProperties.Region });

            const assumeRoleResponse = await sts.send(new AssumeRoleCommand({
              RoleArn: event.ResourceProperties.EC2Role,
              RoleSessionName: "eni-trunking-enable-session",
              DurationSeconds: 900
            }));

            // Instantiate an ECS client using the credentials of the EC2 role
            const ecs = new ECSClient({
              region: event.ResourceProperties.Region,
              credentials: {
                accessKeyId: assumeRoleResponse.Credentials.AccessKeyId,
                secretAccessKey: assumeRoleResponse.Credentials.SecretAccessKey,
                sessionToken: assumeRoleResponse.Credentials.SessionToken
              }
            });

            const putAccountResponse = await ecs.send(new PutAccountSettingCommand({
              name: 'awsvpcTrunking',
              value: 'enabled'
            }));
            console.log(putAccountResponse);

            await response.send(event, context, response.SUCCESS);
          };
      Handler: index.handler
      Runtime: nodejs20.x
      Timeout: 30
      Role: !GetAtt CustomEniTrunkingRole.Arn

  # The role used by the ENI trunking custom resource
  CustomEniTrunkingRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        # https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicExecutionRole.html
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

  # This allows the custom CloudFormation resource in Lambda
  # to assume the role that is used by the EC2 instances. The Lambda function must
  # assume this role because the ecs:PutAccountSetting must be called either
  # by the role that the setting is for, or by the root account, and we aren't
  # using the root account for CloudFormation.
  AllowEniTrunkingRoleToAssumeEc2Role:
    Type: AWS::IAM::Policy
    Properties:
      Roles:
        - !Ref CustomEniTrunkingRole
      PolicyName: allow-to-assume-ec2-role
      PolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Action: sts:AssumeRole
            Resource: !GetAtt EC2Role.Arn

  # This is the actual custom resource, which triggers the invocation
  # of the Lambda function that enabled ENI trunking during the stack deploy
  CustomEniTrunking:
    Type: Custom::CustomEniTrunking
    Properties:
      ServiceToken: !GetAtt CustomEniTrunkingFunction.Arn
      Region: !Ref "AWS::Region"
      EC2Role: !GetAtt EC2Role.Arn

  # Autoscaling group. This launches the actual EC2 instances that will register
  # themselves as members of the cluster, and run the docker containers.
  ECSAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      # This is to ensure that the ASG gets deleted first before these
      # resources, when it comes to stack teardown.
      - ECSCluster
      - EC2Role
    UpdatePolicy:
      # This configures the ASG to wait on resource signals from the cfn-init
      # script that runs on the instance itself. Depending on the expected
      # total size of your ASG you may need to tune the parameters below
      AutoScalingRollingUpdate:
        MaxBatchSize: 5
        MinInstancesInService: 1 # Note that ECS draining hook will maintain instances that are still hosting tasks
        PauseTime: PT2M
        WaitOnResourceSignals: true
        MinSuccessfulInstancesPercent: 100
    Properties:
      VPCZoneIdentifier: !Ref SubnetIds
      LaunchTemplate:
        LaunchTemplateId: !Ref ContainerInstances
        Version: !GetAtt ContainerInstances.LatestVersionNumber
      MinSize: 0
      MaxSize: !Ref MaxSize
      # We are relying on ECS draining to safely drain tasks from hosts that need
      # to be replaced.
      NewInstancesProtectedFromScaleIn: false

  # The config for each instance that is added to the cluster
  ContainerInstances:
    Type: AWS::EC2::LaunchTemplate
    Metadata:
      AWS::CloudFormation::Init:
        configSets:
          full_install: [install_deps, verify_instance_health, signal_cfn]
        # Install dependencies
        install_deps:
          commands:
            InstallDependencies:
              command: |
                yum install -y awscli jq
        # Check the ECS API to see if this instance is available as capacity
        # inside of the ECS cluster, and wait for it to run the healthiness daemon
        verify_instance_health:
          commands:
            ECSHealthCheck:
              command: |
                echo "Introspecting ECS agent status"
                find_container_instance_arn() {
                  CONTAINER_INSTANCE_ARN=$(curl --connect-timeout 1 --max-time 1 -s http://localhost:51678/v1/metadata | jq -r '.ContainerInstanceArn')
                }
                find_container_instance_arn
                while [ "$CONTAINER_INSTANCE_ARN" == "" ]; do sleep 2; find_container_instance_arn; done
                echo "Container Instance ARN: $CONTAINER_INSTANCE_ARN"

                echo "Waiting for at least one running task"
                count_instance_tasks() {
                  NUMBER_OF_TASKS=$(curl -s http://localhost:51678/v1/tasks | jq '.Tasks | length')
                }
                count_instance_tasks
                while [ $NUMBER_OF_TASKS -lt 1 ]; do sleep 2; count_instance_tasks; done

                echo "Instance $CONTAINER_INSTANCE_ARN is now hosting $NUMBER_OF_TASKS task(s)"
        # This signals back to CloudFormation once the instance has become healthy in ECS
        # and has started hosting at least one task
        signal_cfn:
          commands:
            SignalCloudFormation:
              command: !Sub |
                /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackId} --resource ECSAutoScalingGroup --region ${AWS::Region}
    Properties:
      LaunchTemplateData:
        ImageId: !Ref ECSAMI
        InstanceType: !Ref InstanceType
        IamInstanceProfile:
          Name: !Ref EC2InstanceProfile
        SecurityGroupIds:
          - !Ref ContainerHostSecurityGroup
        UserData:
          # This injected configuration file is how the EC2 instance
          # knows which ECS cluster on your AWS account it should be joining
          # It also initiates a CloudFormation init, so that the instance can
          # signal back to CloudFormation when it is ready and healthy in the ECS cluster
          Fn::Base64: !Sub |
            #!/bin/bash -xe
            echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
            yum install -y aws-cfn-bootstrap
            /opt/aws/bin/cfn-init -v --stack ${AWS::StackId} --resource ContainerInstances --configsets full_install --region ${AWS::Region} &
        BlockDeviceMappings:
          - DeviceName: "/dev/xvda"
            Ebs:
              VolumeSize: 50
              VolumeType: gp3
        # Disable IMDSv1, and require IMDSv2
        MetadataOptions:
          HttpEndpoint: enabled
          HttpTokens: required
  EC2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
        - !Ref EC2Role

  # Create an ECS capacity provider to attach the ASG to the ECS cluster
  # so that it autoscales as we launch more containers
  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref ECSAutoScalingGroup
        ManagedScaling:
          InstanceWarmupPeriod: 60
          MinimumScalingStepSize: 1
          MaximumScalingStepSize: 100
          Status: ENABLED
          # Percentage of cluster reservation to try to maintain
          TargetCapacity: 100
        ManagedTerminationProtection: DISABLED
        ManagedDraining: ENABLED

  # Create a cluster capacity provider assocation so that the cluster
  # will use the capacity provider
  CapacityProviderAssociation:
    Type: AWS::ECS::ClusterCapacityProviderAssociations
    DependsOn:
      - CustomEniTrunking
      - CustomAsgDestroyer
    Properties:
      CapacityProviders:
        - !Ref CapacityProvider
      Cluster: !Ref ECSCluster
      DefaultCapacityProviderStrategy:
        - Base: 0
          CapacityProvider: !Ref CapacityProvider
          Weight: 1

  # A security group for the EC2 hosts that will run the containers.
  # This can be used to limit incoming traffic to or outgoing traffic
  # from the container's host EC2 instance.
  ContainerHostSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the EC2 hosts that run containers
      VpcId: !Ref VpcId

  # Role for the EC2 hosts. This allows the ECS agent on the EC2 hosts
  # to communciate with the ECS control plane, as well as download the docker
  # images from ECR to run on your host.
  EC2Role:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          # Allow the EC2 instances to assume this role
          - Effect: Allow
            Principal:
              Service: [ec2.amazonaws.com]
            Action: ['sts:AssumeRole']
          # Allow the ENI trunking function to assume this role in order to enable
          # ENI trunking while operating under the identity of this role
          - Effect: Allow
            Principal:
              AWS: !GetAtt CustomEniTrunkingRole.Arn
            Action: ['sts:AssumeRole']
      Path: /

      # See reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonEC2ContainerServiceforEC2Role
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

      Policies:
        # The ENI trunking function will assume this role and then use
        # the ecs:PutAccountSetting to set ENI trunking on for this role
        - PolicyName: allow-to-modify-ecs-settings
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: ecs:PutAccountSetting
                Resource: '*'

  # This is a role which is used within Fargate to allow the Fargate agent
  # to download images, and upload logs.
  ECSTaskExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: [ecs-tasks.amazonaws.com]
            Action: ['sts:AssumeRole']
            Condition:
              ArnLike:
                aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:*
              StringEquals:
                aws:SourceAccount: !Ref AWS::AccountId
      Path: /

      # This role enables basic features of ECS. See reference:
      # https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonECSTaskExecutionRolePolicy
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

  # This launches a very basic container which is only used to verify that an EC2
  # host is capable of launching tasks. The existence of this task is used as an
  # EC2 host sanity check. If the EC2 host is incapable of launching this task it will
  # fail to signal CloudFormation, and CloudFormation will rollback.
  HealthinessDaemonDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: 'healthiness-daemon'
      Memory: 10
      RequiresCompatibilities:
        - EC2
      ExecutionRoleArn: !GetAtt ECSTaskExecutionRole.Arn
      ContainerDefinitions:
        - Name: 'healthcheck-pause'
          Image: public.ecr.aws/docker/library/busybox:latest
          EntryPoint:
            - /bin/sh
            - -c
          Command:
            - while :; do sleep 2073600; done

  # This launches one copy of the healthiness daemon onto each host
  # in the cluster.
  HealthinessDaemon:
    Type: AWS::ECS::Service
    Properties:
      ServiceName: 'healthiness-daemon'
      Cluster: !Ref ECSCluster
      LaunchType: EC2
      SchedulingStrategy: DAEMON
      TaskDefinition: !Ref HealthinessDaemonDefinition

Outputs:
  ClusterName:
    Description: The ECS cluster into which to launch resources
    Value: !Ref ECSCluster
  ECSTaskExecutionRole:
    Description: The role used to start up a task
    Value: !Ref ECSTaskExecutionRole
  CapacityProvider:
    Description: The cluster capacity provider that the service should use
                 to request capacity when it wants to start up a task
    Value: !Ref CapacityProvider

This stack accepts the following parameters that can used to adjust its behavior:

  • InstanceType - An ECS instance type. By default the stack deploys c5.large
  • MaxSize - An upper limit on number of EC2 instances to scale up to. Default 100
  • ECSAMI - The Amazon Machine Image to use for each EC2 instance. Don't change this unless you really know what you are doing.
  • VpcId - The VPC to launch EC2 instances in. Can be the default account VPC.
  • SubnetIds - A comma separated list of subnets from that VPC.

A few things to look out for in this template:

  • CustomAsgDestroyerFunction - This is a custom CloudFormation resource that helps clean up the Auto Scaling Group faster when tearing down the stack.
  • CustomEniTrunkingFunction - This custom CloudFormation resource enables ENI trunking. See the "ENI trunking for Amazon ECS" pattern for more details
  • AWS::AutoScaling::AutoScalingGroup -> UpdatePolicy - This configuration enables the Auto Scaling Group to automatically roll out updates whenever the ECS AMI is updated. The WaitOnResourceSignals setting is used to validate the EC2 instance health during rolling updates.
  • AWS::CloudFormation::Init - This block of configuration defines commands that run on each EC2 instance after it launches. The commands use the ECS agent introspection endpoint to validate that the instance is able to connect to ECS and launch a task
  • HealthinessDaemon - This is an ECS DAEMON type service that launches a lightweight container on each host that just sleeps forever. The existence of this container is used as an indication that the host has been able to successfully join the ECS cluster and launch an ECS task.

Service with a Capacity Provider Strategy

Download the following service-capacity-provider.yml file. This CloudFormation template deploys an ECS service into the cluster, with a capacity provider strategy setup. The service will signal the capacity provider to request capacity, and the capacity provider will scale up the EC2 Autoscaling Group automatically.

File: service-capacity-provider.ymlLanguage: yml
AWSTemplateFormatVersion: '2010-09-09'
Description: An example service that deploys onto EC2 capacity with
             a capacity provider strategy that autoscales the underlying
             EC2 Capacity as needed by the service

Parameters:
  VpcId:
    Type: String
    Description: The VPC that the service is running inside of
  SubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of subnet IDs the AWS VPC tasks are inside of
  ClusterName:
    Type: String
    Description: The name of the ECS cluster into which to launch capacity.
  ECSTaskExecutionRole:
    Type: String
    Description: The role used to start up an ECS task
  CapacityProvider:
    Type: String
    Description: The cluster capacity provider that the service should use
                 to request capacity when it wants to start up a task
  ServiceName:
    Type: String
    Default: example-service
    Description: A name for the service
  ImageUrl:
    Type: String
    Default: public.ecr.aws/docker/library/busybox:latest
    Description: The url of a docker image that contains the application process that
                 will handle the traffic for this service
  ContainerCpu:
    Type: Number
    Default: 256
    Description: How much CPU to give the container. 1024 is 1 CPU
  ContainerMemory:
    Type: Number
    Default: 512
    Description: How much memory in megabytes to give the container
  Command:
    Type: String
    Default: sleep 3600
    Description: The command to run inside of the container
  DesiredCount:
    Type: Number
    Default: 0
    Description: How many copies of the service task to run

Resources:

  # The task definition. This is a simple metadata description of what
  # container to run, and what resource requirements it has.
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Ref ServiceName
      Cpu: !Ref ContainerCpu
      Memory: !Ref ContainerMemory
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - EC2
      ExecutionRoleArn: !Ref ECSTaskExecutionRole
      ContainerDefinitions:
        - Name: !Ref ServiceName
          Cpu: !Ref ContainerCpu
          Memory: !Ref ContainerMemory
          Image: !Ref ImageUrl
          Command: !Split [' ', !Ref 'Command']
          LogConfiguration:
            LogDriver: 'awslogs'
            Options:
              mode: non-blocking
              max-buffer-size: 25m
              awslogs-group: !Ref LogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: !Ref ServiceName

  # The service. The service is a resource which allows you to run multiple
  # copies of a type of task, and gather up their logs and metrics, as well
  # as monitor the number of running tasks and replace any that have crashed
  Service:
    Type: AWS::ECS::Service
    Properties:
      ServiceName: !Ref ServiceName
      Cluster: !Ref ClusterName
      PlacementStrategies:
        - Field: attribute:ecs.availability-zone
          Type: spread
        - Field: cpu
          Type: binpack
      CapacityProviderStrategy:
        - Base: 0
          CapacityProvider: !Ref CapacityProvider
          Weight: 1
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 75
      DesiredCount: !Ref DesiredCount
      NetworkConfiguration:
        AwsvpcConfiguration:
          SecurityGroups:
            - !Ref ServiceSecurityGroup
          Subnets: !Ref SubnetIds
      TaskDefinition: !Ref TaskDefinition

  # Because we are launching tasks in AWS VPC networking mode
  # the tasks themselves also have an extra security group that is unique
  # to them. This is a unique security group just for this service,
  # to control which things it can talk to, and who can talk to it
  ServiceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: !Sub Access to service ${ServiceName}
      VpcId: !Ref VpcId

  # This log group stores the stdout logs from this service's containers
  LogGroup:
    Type: AWS::Logs::LogGroup

Most parameters in this stack will be supplied by a parent stack that passes in resources from the capacity provider stack. However you may be interested in overriding the following parameters:

  • ServiceName - A human name for the service.
  • ImageUrl - URL of a container image to run. By default this stack deploys public.ecr.aws/docker/library/busybox:latest
  • ContainerCpu - CPU shares, where 1024 CPU is 1 vCPU. Default: 256 (1/4th vCPU)
  • ContainerMemory - Megabytes of memory to give the conatiner. Default 512
  • Command - Command to run in the container. Default: sleep 3600
  • DesiredCount - Number of copies of the container to run. Default: 0 (So you can test scaling up from zero)

Parent Stack

Download the following parent.yml file. This stack deploys both of the previous stacks as nested stacks, for ease of grouping and passing parameters from one stack to the next.

File: parent.ymlLanguage: yml
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: Parent stack that deploys the ECS cluster and capacity provider
             then launches a service inside of the cluster

Parameters:
  VpcId:
    Type: AWS::EC2::VPC::Id
    Description: VPC ID where the ECS cluster is launched
  SubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of subnet IDs where the EC2 instances will be launched

Resources:

  # This stack contains cluster wide resources that will be shared
  # by all services that get launched in the stack
  BaseStack:
    Type: AWS::Serverless::Application
    Properties:
      Location: cluster-capacity-provider.yml
      Parameters:
        VpcId: !Ref VpcId
        SubnetIds: !Join [',', !Ref SubnetIds]

  # This service will be launched into the cluster by passing
  # details from the base stack into the service stack
  Service:
    Type: AWS::Serverless::Application
    Properties:
      Location: service-capacity-provider.yml
      Parameters:
        VpcId: !Ref VpcId
        SubnetIds: !Join [',', !Ref SubnetIds]
        ClusterName: !GetAtt BaseStack.Outputs.ClusterName
        ECSTaskExecutionRole: !GetAtt BaseStack.Outputs.ECSTaskExecutionRole
        CapacityProvider: !GetAtt BaseStack.Outputs.CapacityProvider

This parent stack requires the following parameters:

  • VpcId - The ID of a VPC on your AWS account. This can be the default VPC
  • SubnetIds - A comma separated list of subnet ID's within that VPC

Deploying the stacks with SAM

You should now have three files:

  • cluster-capacity-provider.yml - Defines an ECS cluster with production ready operational enhancements
  • service-capacity-provider.yml - Defines an ECS service that deploys into the cluster
  • parent.yml - Parent file that deploys both of the previous files

Use SAM CLI to deploy everything with a command like this:

Language: sh
# Get the VPC ID of the default VPC on the AWS account
DEFAULT_VPC_ID=$(aws ec2 describe-vpcs --filters Name=is-default,Values=true --query 'Vpcs[0].VpcId' --output text)

# Grab the list of subnet ID's from the default VPC, and glue it together into a comma separated list
DEFAULT_VPC_SUBNET_IDS=$(aws ec2 describe-subnets --filters Name=vpc-id,Values=$DEFAULT_VPC_ID --query "Subnets[*].[SubnetId]" --output text | paste -sd, -)

# Now deploy the ECS cluster to the default VPC and it's subnets
sam deploy \
  --template-file parent.yml \
  --stack-name capacity-provider-environment \
  --resolve-s3 \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides VpcId=$DEFAULT_VPC_ID SubnetIds=$DEFAULT_VPC_SUBNET_IDS

INFO

This sample command deploys the stack to the AWS account's pre-existing default VPC. You may wish to deploy the workload to a custom VPC, such as the "Large sized VPC for an Amazon ECS cluster".

WARNING

Depending on what you choose to call your stack in the stack-name parameter, you may get an error in CloudFormation that looks like this:

CreateCapacityProvider error: The specified capacity provider name is invalid. Up to 255 characters are allowed, including letters (upper and lowercase), numbers, underscores, and hyphens. The name cannot be prefixed with "aws", "ecs", or "fargate". Specify a valid name and try again.

If this happens ensure that your parent CloudFormation stack's name does not start with "aws", "ecs", "fargate". The capacity provider in the stack gets an autogenerated name that is derived from the stack name, so if the stack starts with a prohibited word it will cause the capacity provider's name to also start with that prohibited word.

Test scaling up from zero

Initially the ECS cluster will be empty, with no EC2 instances. Additionally the deployed service has a DesiredCount of zero, so there are initially no containers being launched either.

Use the Amazon ECS web console to update the service and set the desired count to a higher number of tasks. You will observe the ECS cluster launch the requested tasks into an initial status of PROVISIONING. At this point the task is just a virtual placeholder. The capacity provider notices the task waiting for capacity and responds by scaling up the autoscaling group to provide some EC2 capacity in the cluster. Finally, ECS places tasks onto this brand new capacity as it comes online.

Test rolling out an EC2 instance update

Whenever there is a new ECS Optimized AMI available the Auto Scaling Group will roll out the update as part of the next CloudFormation stack update. However, you can simulate an update by modifying the AWS::EC2::LaunchTemplate. Locate the UserData script that runs on each EC2 instance, and add a comment to it. For example:

Language: yaml
UserData:
  Fn::Base64: !Sub |
    #!/bin/bash -xe
    # added a test comment here so there is a change for CloudFormation to detect
    echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
    yum install -y aws-cfn-bootstrap
    /opt/aws/bin/cfn-init -v --stack ${AWS::StackId} --resource ContainerInstances --configsets full_install --region ${AWS::Region} &

Now the next time you deploy it will initiate a rolling update of the Auto Scaling Group to replace all the EC2 instances with new instances. You will see that the container workloads on old hosts are gracefully drained and replaced onto new EC2 hosts prior to the older EC2 hosts shutting down.

Test scaling back down to zero

Last but not least update the service in the ECS console to adjust its desired count back down to zero. Once all instances are empty you will see ECS begin to shutdown EC2 instances until the cluster has been scaled back down to zero.

Tear it Down

You can use the following command to tear down the test stack and all of it's created resources:

Language: sh
sam delete --stack-name capacity-provider-environment --no-prompts

See Also

  • If your workload is interruptible you may prefer to save money on your infrastructure costs by using an EC2 Spot Capacity provider instead.