Share this Post

Nathan Peck
Senior Developer Advocate at AWS
Sep 22, 2023 28 min read

Table of Contents

Introduction to Containers
Decoupling Compute
How ECS Works
Integrations with AWS
Summary
Presentation Download

Amazon ECS Core Concepts

This presentation covers core concepts of Amazon Elastic Container Service (ECS) orchestration. It is designed to introduce containers on AWS, and the key topics you need to master to be successful with a container deployment. You will learn the following topics:

Introduction to Containers - What is a container and why do you want to use containers for your application?
Decoupling Compute - How to think of containers: decouple your application from its underlying compute
How ECS Works - The core pieces that make up ECS: task definitions, tasks, services. The actions and API calls you use to interact with ECS.
Integrations with AWS - All the ways ECS helps connect your application to the rest of the AWS toolkit
Summary - The TL;DR if you just want some fast tips

The content below is extracted from the presentation deck, which you can download at the bottom of this article.

Introduction to Containers

Containers have exploded in popularity. It’s rare for a technology to experience rapid adoption across so many different types of workloads. Let’s review what a container is, in order to understand why container based deployments have gotten so popular.

First we need to look at the application that is contained inside of a container. Your own application, just like most modern applications, is not a single thing. It is made up of pieces.

For example, in an interpreted language like JavaScript, you need a runtime engine that is responsible for running your code. You probably utilize open source packages so you don’t have to code everything from scratch. And you may have other dependencies like binary packages that your application utilizes.

All these pieces need to work well together for the application to function properly.

A container provides a way to grab of the pieces that your application depends on and turn them into a single deployable artifact. This artifact is really just a fancy ZIP file. What makes the container special is the tooling that generates the container and then turns it back into a running application.

The lifecycle of a container goes through three stages:

Build - Gather up your application and all of its dependencies. The gathered pieces are turned into one immutable artifact called the container image.
Push - Upload the container image artifact into a registry.
Run - Download the container image artifact from its registry onto some compute, extract it, and run the application in its own isolated environment.

Building a container image is done using a Dockerfile. Think of the Dockerfile as a recipe for how to build your container image. In the example above you can see a simple recipe for a Node.js image. It uses a prebuilt Node.js container image as a starting point. There are two build stages.

The first stage is a development stage, based on a full Node.js development environment image. This stage downloads and installs dependencies using Node Package Manager.

The second stage is the shipped stage, based on a slim Node.js image that is optimized for production deployment. It collects the installed packages from the build stage, and the local code files.

The end product of this Dockerfile is a packaged up Node.js application that carries along with it a specific version of Node.js, specific versions of packages it depends on, and a specific snapshot of the application code.

The Dockerfile is a very reliable way to build your software. You can build a container without worrying about conflicts with the state of the host machine building the container. This also helps solves the problem of “it worked on my machine”, where one developer produces code that other developers are not able to get running.

There is one standard container image format, called Open Container Initiative (OCI) Image Format. An OCI compliant container image can be built from a Dockerfile using different open source tools:

Docker is the original tool which popularized container images in their current form.
Podman and buildah are open source projects built by Red Hat engineers.
Finch is an open source tool sponsored by AWS.

You can use any OCI compliant image builder to create your container image. Different tools have different specialties, so try them all out if you have time and see which one works best for you.

No matter which tool you use to build your container, the next step is to upload it. This is done by pushing the container image to an image registry. On AWS there are two options:

Amazon Elastic Container Registry Public - Good for hosting open source projects and other applications that you intend to distribute publically to anyone on the internet.
Amazon Elastic Container Registry Private - Good for hosting your own private business code or forked versions of code that you want to keep private.

Now we get to the fun part: running your container image. You can pull your container image down out of the container registry onto any compute that has Docker or another OCI compliant container image runtime.

The runtime will download and unpack your container image, and launch the application that it contains. The launched application runs in its own isolated sandbox. Each container is separated from the underlying host operating system, and separated from any other containers that may be running on the same host.

When you have multiple containers to run, the value of containers becomes even clearer. With containers you have one reliable way to bring any application, written in any runtime language, to your compute and run it there.

Distributing your application using a container means your users no longer have to chase down a long manifest list of required dependencies. All they need is a container runtime and the URI of your container image.

A good example of this is application written in Python. A lot of us struggled during the transition from Python 2 to Python 3 because some operating systems had the wrong version of Python installed out of the box. And I’ve personally encountered scenarios where I can’t get some Python script to work right because I have the wrong version of a package that it depends on. Having a container image with all the right things inside makes it way easier to run applications that you aren’t deeply familiar with.

Containers even help to solve cross platform issues. For example, if you have a modern Mac you have an ARM based processor, while you may wish to build your application to run on an Intel based processor. Containers help you to build and run containers independent of architecture.

You can even build multiple versions of your container for different architectures, and consumers of your container can launch your application using a single container image URI which automatically resolves to whichever architecture is approriate for their computer.

For more info on the benefits of using containers see “Why use containers for your application?”

Decoupling Compute

In addition to helping you build your application and distribute it, the container helps to solve management of applications on your compute.

You can run multiple containers on a single piece of compute. Containers keep each running application separated from each other. You can limit how much CPU and memory that each application container uses. Each running container has its own filesystem and is unable to touch the host computer’s underlying filesystem, or another container’s filesystem unless explicitly allowed. Each running container has its own virtual networking stack, which means you can run multiple application containers that bind to port 80 without port collision conflicts.

Containers are designed to decouple your applications from the underlying compute infrastructure that powers the container. This is particularly helpful in a cloud based environment, such as when running your application on AWS. You can now package up your application into a container, download that container onto any piece of AWS compute, and run the container there without having to install any other dependencies on the EC2 instance. And if you launch a large EC2 instance that has more resources than you need, you can run multiple containers on that instance, in order to get better compute utilization and save money.

The goal of containers is to focus on the application itself, and treat the underlying compute as generic capacity. This is where Amazon Elastic Container Service (ECS) excels. With ECS you provide the container and some high level settings for how to run the container. You also provide compute capacity, such as EC2 instances.

Amazon ECS supports the following types of capacity:

Amazon EC2 instances
AWS Fargate
Your own on-premise servers, via ECS Anywhere agent

ECS figures out how to run your application containers across the capacity that you provide it, in a way that makes sense. It will fit applications onto your available compute in such a way that the applications won’t be competing for CPU or memory. Because your application is decoupled from the underlying infrastructure you can add, remove, or upgrade EC2 instances whenever you wish and ECS will autonomously adjust to the new compute infrastructure.

AWS Fargate is the easiest way to run containers using ECS. AWS Fargate provides serverless compute on-demand for your containers. You no longer have to provide EC2 capacity to ECS. Instead ECS just launches containers directly in AWS Fargate.

AWS Fargate works by giving each of your containers a perfectly sized micro-VM that fits your application needs perfectly. You can pay for exactly the right amount of vCPU and memory that you asked for, and stop paying when your container stops.

How ECS Works

Now that you have been introduced to the basic concepts of containers and decoupling containers from infrastructure, we need to talk about the ECS concepts that make it all work.

Everything in ECS starts with a cluster. The cluster in ECS is a grouping mechanism that keeps like resources together, and isolates resources you don’t want to be able to interact with each other. You should create one cluster for each shared environment you wish to deploy.

For example you might create a “development” cluster that is going to run proposed new versions of your application code that are still being worked on, while a “production” cluster runs the version of your code that you are confident works. This way the buggy code that needs to be properly tested can be kept isolated from the code that your users actually interact with.

If you like developing live in the cloud you might choose to give each engineer their own personal cluster to deploy to when testing their code.

In order for ECS to run your application container it needs to know that the application container exists, and it needs some information about the compute and the settings that the container requires. This is the job of the ECS task definition.

An ECS task definition defines the URI of the container image that will be run, as well as details about how much CPU and memory it needs. There are many settings that you can specify, but these are the required ones that every task definition needs to have.

In the example above the URI public.ecr.aws/nginx/nginx:mainline is defining which container to run. This is a public NGINX container hosted in Amazon ECR Public.

When you create a new task definition it comes with version history. Each time you update the task definition a new task definition revision is captured. Each time you build a new version of your container image you should create a new task definition revision that references that version of the container image.

This ensures there is always a link between a particular version of your code and the task definition settings that that version of code needed. This makes deployments reliable and allows for reliable rollbacks when necessary. If you roll out a new version of your code with new settings and then have to reverse the roll out you can roll back to the old code and the old settings at the same time.

The task definition describes how much CPU and memory the container needs, but where does that CPU and memory come from? The ECS capacity provider is responsible for ensuring there is compute capacity for the containers to run on.

For example, if you wish to use Amazon EC2 as compute capacity, you can create a capacity provider that is linked to an EC2 Auto Scaling Group. When the capacity provider sees that a container needs capacity to run, it will adjust the size of the Auto Scaling Group to request that more EC2 capacity be launched. The Auto Scaling Group will then launch more copies of EC2 instances based on an a special Amazon Machine Image (AMI), called the ECS Optimized AMI.

You can try out capacity providers in the pattern: “Amazon ECS Capacity Providers for EC2 instances”

The ECS Optimized AMI comes with all the tooling necessary for an EC2 instance to self register itself as capacity for an ECS cluster to use. This includes:

A container runtime (Docker) - This is what actually downloads and creates running containers on the EC2 instance
The ECS Agent - This is a container itself, which connects to the ECS service and tells ECS about the EC2 instance and how much resources the EC2 instance has.

For example in the diagram above, the EC2 instance has 4 vCPU and 16 GB of memory available, so the ECS agent connects to ECS and tells ECS that it has the ability to launch 4 vCPU and 16 GB worth of containers.

An ECS cluster can have many EC2 instances registered to it. In fact you can register up to 5000 EC2 instances into an ECS cluster. The capacity of the cluster to run containers is determined by adding up all the available resources from all the instances that are registered to the cluster.

It is important to register enough capacity to fit all the containers you want to run, otherwise some containers will not be able to launch. But you also don’t want to launch too many containers, otherwise you are wasting money paying for extra capacity you don’t need.

If you don’t want to think about launching the perfect amount of EC2 capacity then there is a simpler way to handle this problem.

For most workloads you should consider starting with serverless first.

Amazon ECS comes with AWS Fargate built-in. AWS Fargate provides serverless capacity to run your containers on-demand, with no EC2 required. You pay for the amount of vCPU and memory that each container requires. This means no more wasted EC2 capacity that goes unused.

AWS Fargate reduces your operational overhead. If you run an EC2 instance for a long time you will notice that there are updates that need to be applied. You need to do security updates to operating system packages, as well as updates to Docker and the ECS Agent itself. In addition, you are also still responsible for updates to the code inside of your container image.

Underlying compute updates are handled for you automatically with AWS Fargate. You no longer have to think about the host operating system updates, or updates to ECS itself. You no longer have to think about EC2 capacity management. All you have to think about is updating your own application container.

There is more to learn about the trade-offs between using Amazon EC2 and AWS Fargate. In general AWS Fargate is much easier to get started with, without requiring as much knowledge or maintance. Amazon EC2 can be a bit cheaper for extremely large scale deployments.

The most basic way to run a container is to use the ECS RunTask API. This API asks ECS to launch a task definition in a cluster. ECS looks at the cluster and finds capacity to run the container. It reserves this capacity for the container, then instructs the ECS agent on that compute to turn the task definition into a running task.

The ECS agent launches a task by using a container runtime to download the container image from it’s URI, unpack it, and then launch it. The ECS agent then communicates this status update back to the central ECS control plane. ECS keeps a central knowledge of what is running and how much resources are in use at any time.

In this diagram you see an EC2 instance, but this same flow applies to tasks running on AWS Fargate capacity.

Running a standalone task is fine, but sometimes you have code for a service such as a web server or API server. This code needs to stay up and running at all times to serve clients. If the code crashes it needs to be restarted as soon as possible. Additionally, you may want to run multiple copies of the containerized code so that you can serve more clients by distributing clients across instances of your code. Or you may wish to run additional copies of your container as redundancy to make sure that there is always at least one instance of the code that can respond to clients.

You can use the ECS CreateService API to launch a replica set of long running tasks. You set a “desired count” for how many tasks you would like to run. ECS places as many tasks on your cluster as needed to fill the services “desired count”. The launched tasks can be spread across many available compute instances registered to the cluster. Optionally you can use advanced placement strategies to densely binpack tasks onto the most minimal set of compute instances.

One important thing about an ECS service is that it is self healing. If one of your containers crashes and the number of running tasks dips below the desired count that you had configured in the service, ECS will see this and launch a replacement task.

Additionally, if ECS sees an entire EC2 instance go out of service (perhaps because of a hardware failure, or a scale-in) then ECS will replace any missing tasks that used to be on that EC2 instance, by relaunching them onto different EC2 instances that are still available.

This means that an ECS service is resiliant to changes, whether those changes are deliberate or accidents. If the state of an ECS service diverges from your desired state, it will continuously attempt to self heal back to the desired state that you configured. Using ECS is like having an operator who watches over your infrastructure 24/7, trying to keep your service online.

The other nice thing about an ECS service is that it allows you to safely roll out updates to your code. For example in this diagram you can see that there is a blue task, and a new green task. There are currently three copies of the blue task running, but we want to run the new green task instead.

When you make an ECS UpdateService API call, the control plane starts to come up with a strategy to roll out your update without causing any downtime. Here is how that works.

ECS launches copies of the new green task in parallel with the copies of the blue task. By default ECS will avoid stopping your old tasks, until there is a new task already running and able to handle any workload that the old task was working on. As you can see in this diagram ECS starts the rolling update by launching two additonal tasks into the available capacity, so that the cluster as a whole is running three blue tasks and two green tasks.

The next step is for ECS to stop some of the unnecessary old tasks that are still running. ECS does this carefully. ECS is aware of whether your task is registered into a load balancer. By default ECS will drain internet traffic from the old tasks, by configuring the load balancer to stop sending new web traffic to the tasks, and waiting for any open connections to close. Additionally, ECS sends a SIGTERM signal to the old tasks and waits up to 30 seconds to give the old containers a chance to stop gracefully.

Once the old tasks have been carefully drained and stopped, the compute capacity that they were using is now available for more new tasks.

ECS uses the newly available capacity to launch a green task, and then stops the blue task. ECS has transitioned the live service from the blue version to the green version, in a careful step by step process that ensured there was no loss of service availability.

But what if the code you are rolling out is bad? Your application may crash or start timing out because of bugs in the new version of the container. ECS has built-in healthcheck mechanisms you can use so that ECS will be aware of whether your container is functioning properly.

You can configure a circuit breaker so that if your container is malfunctioning during a deployment, ECS will stop the deployment and initiate a rollback. ECS will not stop the old healthy tasks if the new tasks are failing healthchecks. Just like during a rollout, ECS will configure the load balancer to send traffic to healthy tasks, and carefully drain traffic from stopping tasks.

In the diagram you can see that ECS initiates a rolling deploy from the blue task to the green task. However, one of the green tasks is unhealthy and fails healthchecks. As a result ECS avoids stopping blue tasks, and instead rolls traffic back entirely onto the blue tasks. It drains all traffic from the green tasks and shuts them down.

ECS service deployments are designed to be safe, and a great choice for updating important internet facing code.

Integrations with AWS

Containers offer a lot of utility, but modern compute also has a lot of needs. Elastic Container Service is designed to serve your application needs by setting up integrations between your container and the rest of the AWS toolkit.

In this section we will go through each of the needs of a containerized application, and how ECS helps with each need.

Whether your application code is running nicely, or crashing instantly, you are going to want to see some logs. In the old days you’d just write these logs to a local filesystem, but then you had two problems. First, if you wanted to see those logs you’d have to SSH into a specific VM where the application was running. And second, those logs were constantly filling up the local disk, until you terminated it, and then the logs were gone for good.

Elastic Container Service provides two ways to handle container logs:

Docker awslogs logging driver - This logging driver built-in to Docker grabs your container’s stdout and stderr output and automatically pipes to an Amazon CloudWatch log group.
AWS Firelens - This is a Fluent Bit based sidecar that can be configured as your logging driver. It also supports enhancing your logs with extra metadata, filtering logs, and splitting logs to different destinations. You can check out an example pattern for deploying AWS Firelens using AWS CDK.

Amazon CloudWatch functions as a central storage for all your logs from all your containers. You can view all your logs, from all of your compute, via the AWS console. And you can use AWS CloudWatch Log Insights to query your logs. Queries let you target specific scenarios like exception messages in a specific timeframe, from a task with a specific task ID.

Most services have varying compute workload throughout the day. When workload is high you need to split the work across multiple copies of your container. When workload is low you need to scale down the number of containers to save money.

Amazon ECS captures metrics from your container deployment to measure how your application is performing. ECS also integrates with AWS Application Auto Scaling so that you can implement scaling polices based on your metrics. Your service’s desired count of containers will increases and decrease automatically as you have more or less workload.

For more info check out patterns for:

In order to get workload to your containers, your public facing web containers need some form of traffic ingress. ECS integrates with Elastic Load Balancing.

Here’s how it works. An Elastic Load Balancer is made up of a listener (which accepts traffic from the public internet), and a target group (which is the list of backends that can accept that traffic). You can configure rules so that traffic arriving at the listener gets proxied to targets from a target group.

When ECS launches a container it automatically configures your target group to add the IP address and port number of the downstream container. ECS keeps your target group’s list in sync as containers start and stop. Even if your container suddenly crashes, or you terminate an EC2 instance, ECS will react and update your load balancer’s target group list.

Additionally, when stopping tasks ECS safely drains traffic from tasks that are in a load balancer. ECS does this by configuring the load balancer to stop sending new traffic to the task. ECS waits for the task to stop serving any in flight traffic before it stops the task.

Check out the pattern for a simple website in AWS Fargate, fronted by an Application Load Balancer ingress.

Without a firewall or private networking your containers would be potentially open to attack by malicious actors. Fortunately ECS natively integrates with Amazon VPC.

ECS gives each task an Elastic Network Adapter (ENI), with its own IP address and it’s own security group. The security group allows you to define custom ingress rules for each container in your cluster. This isolation works even when containers are running on the same underlying EC2 host. The EC2 instance gets multiple ENI’s attached to it, and multiple private IP addresses from your VPC. Each private IP address goes to a different container. Additionally any outgoing traffic originating from a container enters the VPC via it’s own ENI.

ECS is designed so that you can make granular security group rules. For example in this diagram you can see that traffic has been allowed or denied for each potential network connection.

The web container does not accept traffic directly from the public internet. It only accepts traffic from the load balancer. The load balancer is the only part of the infrastructure that is allowed to accept direct traffic from the public internet. The load balancer can filter out many types of bad traffic, and only allow good internet traffic through to the web service.

Additionally, in this diagram the running containers are not allowed to talk to each other unless explictly allowed to do so by a security group ingress rule. You can see that the web container has been allowed to initiate connections to the API container, because the website needs to fetch some info from the API in order to build the web page. However the API container is not allowed to initiate connections to the web container, because there is no legitimate reason why the backend API would need to do that.

You can and should make your traffic ingress rules as granular as possible, for added security.

This leads into the next topic: finding other containers in the cluster. Container to container communication requires that containers be able to locate each other by IP address. This is complex because containers tend to be ephemeral and they get a new IP address each time they restart. As ECS services scale up and down the container count and container IP addresses will be continuously changing.

ECS integrates with service discovery to help containers find each other in the cluster. AWS Cloud Map is designed to keep track of a list of IP addresses for instances of a service. You applications can make use of Cloud Map via DNS record, or API call. ECS keeps AWS Cloud Map synced with a list of the container IP addresses, just like it keeps the load balancer target group synced.

This makes it easier for you to have one container make network requests to another container. For example the web service can send a request to the API service by using a DNS name like api.production.internal

DNS based service discovery is very low latency as traffic travels directly from one container to the other container. However it is a bit complex as you need to handle request retries and other service to service communication best practices.

If you are deploying a microservice architecture there is a lot of service to service communication, so you may prefer to use a service mesh.

ECS Service Connect is an Envoy Proxy powered service mesh built-in to ECS. When you enable Service Connect for your service, each service launched task in your cluster gets a tiny Envoy Proxy sidecar attached to it. Your service’s outbound service to service traffic gets routed through this container.

The Envoy Proxy sidecar gets configuration updates from the ECS control plane, so it is aware of the IP addresses of other tasks in the cluster. As containers start and stop ECS pushes updates to the Envoy Proxy container so it is proactively updated with the current state of the downstream tasks that are available. The Envoy Proxy container also handles retries and other important service to service communication responsibilities that you wish to offload out of your application code.

ECS Service Connect is great to use when you have lots of microservices in your cluster, you don’t want to implement your own service to service communication boilerplate, and you don’t want to pay for an internal load balancer.

Check out the ECS Service Connect pattern for an example of deploying a service mesh with ECS.

If you are running on AWS, you probably want to use other AWS services. For example, your application may wish to store objects in S3, or take advantage of a serverless NoSQL table powered by Amazon DynamoDB.

In order to make API calls to other AWS services you application needs an IAM role. ECS handles IAM role vending automatically. You configure the IAM role that you wish your application to use as part of your task definition. When ECS launches your task it will configure a metadata endpoint that automatically serves short lived, auto rotating AWS credentials to the AWS SDK inside of your application.

This is great for security because you can give each of your containers its own unique IAM role. Even if the containers are running side by side on the same underlying EC2 host, each container will have its own IAM role with its own access level to the rest of AWS. This allows you to build a secure, minimal access setup. In the diagram above you can see that one container has access to a resources in S3 and the other container has access to a resource in DynamoDB. But the containers are not able to access each other’s AWS resources.

For an example of configuring an IAM roles for an ECS task, see “Bun JavaScript container that uses AWS SDK to connect to DynamoDB”.

Persisting data is very important for all applications. Wherever it makes sense, you should take advantage of serverless cloud API’s like S3 or DynamoDB. But sometimes you just want to write a file to disk and have it persist. This can be challenging with containers because they are designed to be ephemeral. When a container restarts it could be restarted on completely different hardware.

ECS integrates with Elastic File System to provide durable storage for your containers. An Elastic File System is a network file system that not only persists your critical data between restarts, but it also ensures that you can mount the same volume to different containers and they will all see the same shared file data. Any changes made by one container will be seen by all containers.

For an example of attaching an Elastic File System to a running container see: “Add durable storage to an ECS task, with Amazon Elastic File System”

ECS gives you ECS Exec, a secure way to access your live containers and run commands inside.

You may have used SSH to access an EC2 instance. SSH requires that you open up port 22 to inbound traffic from the internet. An SSH server listens on port 22, uses cryptographic verification to validate that you are an authorized user of the machine, then gives you secure access to an interactive shell on the remote instance.

ECS Exec is designed to not require any open ports at all. Instead, your task initiates opening up a connection to the AWS Systems Manager service. AWS Systems Manager keeps this connection open on its side and can communicate back down to an agent running in the task if needed. If you want to connect to your container you can use the AWS CLI to open a connection up from your local machine to the Systems Manager service. If you have the right IAM permissions, then Systems Manager will securely connect you through to the container, and you will be able to run commands via an interactive shell inside of your remote container.

Summary

This presentation has introduced the core concepts of Elastic Container Service, by covering the following topics.

Containers are based on a modern, open source application packaging format. Container images simplify software distribution and help you run containers anywhere you want. You build a container image, push it, and then run containers based on your container image.
Amazon ECS is a container orchestrator that helps launch and monitor many copies of your application container across underlying compute such as Amazon EC2.
AWS Fargate is serverless capacity to run your containers. If you prefer less maintenance overhead, then you’ll prefer AWS Fargate.

ECS operates using the following core concepts:

Task definitions define the settings for how you wish to run your container. You create task definition revisions as you publish new versions of your application.
You turn a task definition into a running task by launching it via the ECS API.
You can launch a standalone task that runs to completion, then exits. Or you can create a long running service that launches multiple copies of your task and tries to maintain a certain number of running tasks at all times.
You can update a service to start a rolling deployment that replaces old tasks with new tasks that run a new version of your code.

ECS connects your containers to the rest of the AWS toolkit:

Logs and metrics - Amazon CloudWatch
Auto Scaling - AWS Application Auto Scaling
Ingress - Elastic Load Balancing
Firewall and Network Security - Amazon Virtual Private Cloud (Amazon VPC)
Service Discovery - AWS CloudMap and ECS Service Connect
Roles - AWS Identity and Access Management (IAM)
Durable filesystem - Amazon Elastic File System (EFS)
Interactive shell access - ECS Exec through AWS Systems Manager

Presentation Download

Did you enjoy this article? You can grab the presentation deck for yourself if you’d like to use the diagrams, or share the presentation with someone else.

Download Presentation

Article

Amazon ECS Scalability Best Practices »