Deploy Jupyter notebook container with Amazon ECS
About
Jupyter Notebook is a web-based interactive computing platform. It is popular for machine learning and as an IDE for developing in multiple programming languages. JupyterLab is the latest version of Juypter notebook, with a more IDE like experience, and modular, extendable design.
AWS Inferentia accelerators deliver high performance at the lowest cost for your deep learning (DL) inference applications. When training your models use AWS Trainium Instances, which are optimized for model training.
AWS Neuron is an SDK which runs your machine learning models on the underlying hardware acceleration of AWS Inferentia or AWS Trainium.
This pattern will show how to build and deploy a containerized version of Jupyter notebook, with the AWS Neuron SDK for machine learning, accelerated by AWS Inferentia and AWS Trainium hardware.
WARNING
This pattern is designed to setup a production ready machine learning environment that can be scaled up later for running extremely large machine learning training or real time inference jobs accelerated by some of the most powerful hardware that AWS has. Therefore this pattern has a fairly high baseline cost (about $2 an hour, and the largest instance choice available costs >$12 an hour). Consider using Amazon Sagemaker Notebooks on smaller EC2 instances for a low cost learning environment that is free tier eligible.
Setup
If not already installed, ensure that you have the following dependencies installed locally:
- Docker or other OCI compatible container builder. This will be used to prepare a custom JupyterLab image.
- Amazon ECR Credential Helper. This will assist you with uploading your container image to Amazon Elastic Container Registry.
- AWS SAM CLI. This tool will help you deploy multiple CloudFormation stacks at once and pass values from one stack to the next automatically.
Architecture
The following diagram shows the architecture of what will be deployed:
- An Application Load Balancer provides ingress from the public internet.
- Traffic goes to an
inf2.8xlarge
AWS Inferentia powered EC2 instance launched by Amazon ECS. You can adjust the AWS Inferentia instance class as desired. - Amazon ECS has placed a container task on the instance, which hosts JupyterLab and the AWS Neuron SDK.
- Amazon ECS has connected the container to the underlying Neuron device provided by the AWS Inferentia instance.
- Machine learning workloads that you run inside the container are able to connect to the hardware accelerator.
- Access to the JupyterLab notebook is protected by a secret token that is stored in AWS Secrets Manager. Amazon ECS manages retrieving this secret value and injecting it into the Jupyter server on container startup.
Build a Jupyter notebook container
In order to build a Jupyter notebook container image we will start with a prebuilt container image from the AWS Deep Learning Container collection, then install JupyterLab on top of it:
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.0-neuronx-py38-sdk2.9.0-ubuntu20.04
RUN pip install jupyterlab
CMD jupyter-lab
Create the Dockerfile
and then build the custom image locally:
docker build -t jupyter-notebook .
Now we need to create an Amazon Elastic Container Registry:
aws ecr create-repository --repository-name jupyter-notebook
You should get a response similar to this:
{
"repository": {
"repositoryUri": "209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
},
"registryId": "209640446841",
"imageTagMutability": "MUTABLE",
"repositoryArn": "arn:aws:ecr:us-east-2:209640446841:repository/jupyter-notebook",
"repositoryName": "jupyter-notebook",
"createdAt": 1683047667.0
}
}
Copy the repositoryUri
as this is how you will interact with the repository. Use similar commands to tag your built image and then push it to Amazon ECR:
docker tag jupyter-notebook 209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest
docker push 209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest
INFO
If you get a 401 Unauthorized
error then make sure you have installed the Amazon ECR credential helper properly. It will automatically use your current AWS credentials to authenticate with the ECR repository on the fly.
Define a VPC for the workload
The following CloudFormation file defines a VPC for the workload.
AWSTemplateFormatVersion: '2010-09-09'
Description: This stack deploys a large AWS VPC with internet access
Mappings:
# Hard values for the subnet masks. These masks define
# the range of internal IP addresses that can be assigned.
# The VPC can have all IP's from 10.0.0.0 to 10.0.255.255
# There are four subnets which cover the ranges:
#
# 10.0.0.0 - 10.0.63.255 (16384 IP addresses)
# 10.0.64.0 - 10.0.127.255 (16384 IP addresses)
# 10.0.128.0 - 10.0.191.255 (16384 IP addresses)
# 10.0.192.0 - 10.0.255.0 (16384 IP addresses)
#
SubnetConfig:
VPC:
CIDR: '10.0.0.0/16'
PublicOne:
CIDR: '10.0.0.0/18'
PublicTwo:
CIDR: '10.0.64.0/18'
PrivateOne:
CIDR: '10.0.128.0/18'
PrivateTwo:
CIDR: '10.0.192.0/18'
Resources:
# VPC in which containers will be networked.
# It has two public subnets, and two private subnets.
# We distribute the subnets across the first two available subnets
# for the region, for high availability.
VPC:
Type: AWS::EC2::VPC
Properties:
EnableDnsSupport: true
EnableDnsHostnames: true
CidrBlock: !FindInMap ['SubnetConfig', 'VPC', 'CIDR']
# Two public subnets, where containers can have public IP addresses
PublicSubnetOne:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PublicOne', 'CIDR']
MapPublicIpOnLaunch: true
PublicSubnetTwo:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PublicTwo', 'CIDR']
MapPublicIpOnLaunch: true
# Two private subnets where containers will only have private
# IP addresses, and will only be reachable by other members of the
# VPC
PrivateSubnetOne:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 0
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PrivateOne', 'CIDR']
PrivateSubnetTwo:
Type: AWS::EC2::Subnet
Properties:
AvailabilityZone:
Fn::Select:
- 1
- Fn::GetAZs: {Ref: 'AWS::Region'}
VpcId: !Ref 'VPC'
CidrBlock: !FindInMap ['SubnetConfig', 'PrivateTwo', 'CIDR']
# Setup networking resources for the public subnets. Containers
# in the public subnets have public IP addresses and the routing table
# sends network traffic via the internet gateway.
InternetGateway:
Type: AWS::EC2::InternetGateway
GatewayAttachement:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref 'VPC'
InternetGatewayId: !Ref 'InternetGateway'
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref 'VPC'
PublicRoute:
Type: AWS::EC2::Route
DependsOn: GatewayAttachement
Properties:
RouteTableId: !Ref 'PublicRouteTable'
DestinationCidrBlock: '0.0.0.0/0'
GatewayId: !Ref 'InternetGateway'
PublicSubnetOneRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetOne
RouteTableId: !Ref PublicRouteTable
PublicSubnetTwoRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref PublicSubnetTwo
RouteTableId: !Ref PublicRouteTable
# Setup networking resources for the private subnets. Containers
# in these subnets have only private IP addresses, and must use a NAT
# gateway to talk to the internet. We launch two NAT gateways, one for
# each private subnet.
NatGatewayOneAttachment:
Type: AWS::EC2::EIP
DependsOn: GatewayAttachement
Properties:
Domain: vpc
NatGatewayTwoAttachment:
Type: AWS::EC2::EIP
DependsOn: GatewayAttachement
Properties:
Domain: vpc
NatGatewayOne:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGatewayOneAttachment.AllocationId
SubnetId: !Ref PublicSubnetOne
NatGatewayTwo:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NatGatewayTwoAttachment.AllocationId
SubnetId: !Ref PublicSubnetTwo
PrivateRouteTableOne:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref 'VPC'
PrivateRouteOne:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTableOne
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGatewayOne
PrivateRouteTableOneAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTableOne
SubnetId: !Ref PrivateSubnetOne
PrivateRouteTableTwo:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref 'VPC'
PrivateRouteTwo:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTableTwo
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NatGatewayTwo
PrivateRouteTableTwoAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTableTwo
SubnetId: !Ref PrivateSubnetTwo
Outputs:
VpcId:
Description: The ID of the VPC that this stack is deployed in
Value: !Ref 'VPC'
PublicSubnetIds:
Description: Comma seperated list of public facing subnets that have
a direct internet connection as long as you assign a public IP
Value: !Sub '${PublicSubnetOne},${PublicSubnetTwo}'
PrivateSubnetIds:
Description: Comma seperated list of private subnets that use a NAT
gateway for internet access.
Value: !Sub '${PrivateSubnetOne},${PrivateSubnetTwo}'
For more info about this VPC see the pattern "Large sized AWS VPC for an Amazon ECS cluster".
Define Amazon ECS cluster of AWS Inferentia instances
The following CloudFormation file defines an Amazon ECS cluster that launches AWS Inferentia instances as capacity for running containers. These instances have hardware acceleration that is optimized for running machine learning inference jobs.
AWSTemplateFormatVersion: '2010-09-09'
Description: ECS Cluster optimized for machine learning inference workloads
Parameters:
InstanceType:
Type: String
Default: inf2.8xlarge
Description: Class of AWS Inferentia instance used to run containers.
AllowedValues: [inf1.xlarge, inf1.2xlarge, inf1.6xlarge, inf1.24xlarge,
inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge]
ConstraintDescription: Please choose a valid instance type.
DesiredCapacity:
Type: Number
Default: '0'
Description: Number of instances to initially launch in your ECS cluster.
MaxSize:
Type: Number
Default: '2'
Description: Maximum number of instances that can be launched in your ECS cluster.
ECSAMI:
Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
Default: /aws/service/ecs/optimized-ami/amazon-linux-2/inf/recommended/image_id
Description: The Amazon Machine Image ID used for the cluster, leave it as the default value to get the latest Inferentia optimized AMI
VpcId:
Type: AWS::EC2::VPC::Id
Description: VPC ID where the ECS cluster is launched
SubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Description: List of subnet IDs where the EC2 instances will be launched
Resources:
# Cluster that keeps track of container deployments
ECSCluster:
Type: AWS::ECS::Cluster
Properties:
ClusterSettings:
- Name: containerInsights
Value: enabled
# Autoscaling group. This launches the actual EC2 instances that will register
# themselves as members of the cluster, and run the docker containers.
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
DependsOn:
# This is to ensure that the ASG gets deleted first before these
# resources, when it comes to stack teardown.
- ECSCluster
- EC2Role
Properties:
VPCZoneIdentifier:
- !Select [ 0, !Ref SubnetIds ]
- !Select [ 1, !Ref SubnetIds ]
LaunchTemplate:
LaunchTemplateId: !Ref ContainerInstances
Version: !GetAtt ContainerInstances.LatestVersionNumber
MinSize: 0
MaxSize: !Ref MaxSize
DesiredCapacity: !Ref DesiredCapacity
NewInstancesProtectedFromScaleIn: true
UpdatePolicy:
AutoScalingReplacingUpdate:
WillReplace: 'true'
# The config for each instance that is added to the cluster
ContainerInstances:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateData:
ImageId: !Ref ECSAMI
InstanceType: !Ref InstanceType
IamInstanceProfile:
Name: !Ref EC2InstanceProfile
SecurityGroupIds:
- !Ref ContainerHostSecurityGroup
UserData:
# This injected configuration file is how the EC2 instance
# knows which ECS cluster on your AWS account it should be joining
Fn::Base64: !Sub |
#!/bin/bash
echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
BlockDeviceMappings:
- DeviceName: "/dev/xvda"
Ebs:
VolumeSize: 50
VolumeType: gp3
# Disable IMDSv1, and require IMDSv2
MetadataOptions:
HttpEndpoint: enabled
HttpTokens: required
EC2InstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Path: /
Roles:
- !Ref EC2Role
# Custom resource that force destroys the ASG. This cleans up EC2 instances that had
# managed termination protection enabled, but which are not yet released.
# This is necessary because ECS does not immediately release an EC2 instance from termination
# protection as soon as the instance is no longer running tasks. There is a cooldown delay.
# In the case of tearing down the CloudFormation stack, CloudFormation will delete the
# AWS::ECS::Service and immediately move on to tearing down the AWS::ECS::Cluster, disconnecting
# the AWS::AutoScaling::AutoScalingGroup from ECS management too fast, before ECS has a chance
# to asynchronously turn off managed instance protection on the EC2 instances.
# This will leave some EC2 instances stranded in a state where they are protected from scale-in forever.
# This then blocks the AWS::AutoScaling::AutoScalingGroup from cleaning itself up.
# The custom resource function force destroys the autoscaling group when tearing down the stack,
# avoiding the issue of protected EC2 instances that can never be cleaned up.
CustomAsgDestroyerFunction:
Type: AWS::Lambda::Function
Properties:
Code:
ZipFile: !Sub |
const { AutoScalingClient, DeleteAutoScalingGroupCommand } = require("@aws-sdk/client-auto-scaling");
const autoscaling = new AutoScalingClient({ region: '${AWS::Region}' });
const response = require('cfn-response');
exports.handler = async function(event, context) {
console.log(event);
if (event.RequestType !== "Delete") {
await response.send(event, context, response.SUCCESS);
return;
}
const input = {
AutoScalingGroupName: '${ECSAutoScalingGroup}',
ForceDelete: true
};
const command = new DeleteAutoScalingGroupCommand(input);
const deleteResponse = await autoscaling.send(command);
console.log(deleteResponse);
await response.send(event, context, response.SUCCESS);
};
Handler: index.handler
Runtime: nodejs20.x
Timeout: 30
Role: !GetAtt CustomAsgDestroyerRole.Arn
# The role used by the ASG destroyer
CustomAsgDestroyerRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Principal:
Service:
- lambda.amazonaws.com
Action:
- sts:AssumeRole
ManagedPolicyArns:
# https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicExecutionRole.html
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: allow-to-delete-autoscaling-group
PolicyDocument:
Version: 2012-10-17
Statement:
- Effect: Allow
Action: autoscaling:DeleteAutoScalingGroup
Resource: !Sub arn:aws:autoscaling:${AWS::Region}:${AWS::AccountId}:autoScalingGroup:*:autoScalingGroupName/${ECSAutoScalingGroup}
CustomAsgDestroyer:
Type: Custom::AsgDestroyer
DependsOn:
- CapacityProviderAssociation
Properties:
ServiceToken: !GetAtt CustomAsgDestroyerFunction.Arn
Region: !Ref "AWS::Region"
# Create an ECS capacity provider to attach the ASG to the ECS cluster
# so that it autoscales as we launch more containers
CapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
AutoScalingGroupProvider:
AutoScalingGroupArn: !Ref ECSAutoScalingGroup
ManagedScaling:
InstanceWarmupPeriod: 60
MinimumScalingStepSize: 1
MaximumScalingStepSize: 100
Status: ENABLED
# Percentage of cluster reservation to try to maintain
TargetCapacity: 100
ManagedTerminationProtection: ENABLED
ManagedDraining: ENABLED
# Create a cluster capacity provider assocation so that the cluster
# will use the capacity provider
CapacityProviderAssociation:
Type: AWS::ECS::ClusterCapacityProviderAssociations
Properties:
CapacityProviders:
- !Ref CapacityProvider
Cluster: !Ref ECSCluster
DefaultCapacityProviderStrategy:
- Base: 0
CapacityProvider: !Ref CapacityProvider
Weight: 1
# A security group for the EC2 hosts that will run the containers.
# This can be used to limit incoming traffic to or outgoing traffic
# from the container's host EC2 instance.
ContainerHostSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Access to the EC2 hosts that run containers
VpcId: !Ref VpcId
# Role for the EC2 hosts. This allows the ECS agent on the EC2 hosts
# to communciate with the ECS control plane, as well as download the docker
# images from ECR to run on your host.
EC2Role:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ec2.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
# See reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonEC2ContainerServiceforEC2Role
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role
# This is a role which is used within Fargate to allow the Fargate agent
# to download images, and upload logs.
ECSTaskExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [ecs-tasks.amazonaws.com]
Action: ['sts:AssumeRole']
Condition:
ArnLike:
aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:*
StringEquals:
aws:SourceAccount: !Ref AWS::AccountId
Path: /
# This role enables basic features of ECS. See reference:
# https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonECSTaskExecutionRolePolicy
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Outputs:
ClusterName:
Description: The ECS cluster into which to launch resources
Value: !Ref ECSCluster
ECSTaskExecutionRole:
Description: The role used to start up a task
Value: !Ref ECSTaskExecutionRole
CapacityProvider:
Description: The cluster capacity provider that the service should use
to request capacity when it wants to start up a task
Value: !Ref CapacityProvider
By default this template deploys inf2.xlarge
instances. You can launch additional tasks in the Amazon ECS cluster to automatically scale out the number of AWS Inferentia instances. If you plan to run containers that do not need machine learning acceleration, then do not use this pattern, and instead deploy a cluster that uses a less expensive EC2 instance that is compute optimized instead of machine learning optimized.
Define the Juypter notebook task
The following CloudFormation template deploys a Jupyter Notebook task under Amazon ECS orchestration:
AWSTemplateFormatVersion: '2010-09-09'
Description: An example service that deploys an Jupyter Lab notebook
with AWS Neuron support for machine learning.
Parameters:
VpcId:
Type: String
Description: The VPC that the service is running inside of
PublicSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Description: List of public facing subnets
PrivateSubnetIds:
Type: List<AWS::EC2::Subnet::Id>
Description: List of private subnets
ClusterName:
Type: String
Description: The name of the ECS cluster into which to launch capacity.
ECSTaskExecutionRole:
Type: String
Description: The role used to start up an ECS task
CapacityProvider:
Type: String
Description: The cluster capacity provider that the service should use
to request capacity when it wants to start up a task
ServiceName:
Type: String
Default: jupyter
Description: A name for the service
ImageUrl:
Type: String
Description: The URL of a Juypter notebook container to run
ContainerCpu:
Type: Number
Default: 10240
Description: How much CPU to give the container. 1024 is 1 CPU
ContainerMemory:
Type: Number
Default: 32768
Description: How much memory in megabytes to give the container
MyIp:
Type: String
Default: 0.0.0.0/0
Description: The IP addresses that you want to accept traffic from.
Default accepts traffic from anywhere on the internet.
Resources:
# The task definition. This is a simple metadata description of what
# container to run, and what resource requirements it has.
TaskDefinition:
Type: AWS::ECS::TaskDefinition
DependsOn:
- TaskAccessToJupyterToken
Properties:
Family: !Ref ServiceName
NetworkMode: awsvpc
RequiresCompatibilities:
- EC2
ExecutionRoleArn: !Ref ECSTaskExecutionRole
ContainerDefinitions:
- Name: jupyter
Cpu: !Ref ContainerCpu
MemoryReservation: !Ref ContainerMemory
Image: !Ref ImageUrl
EntryPoint:
- '/bin/sh'
- '-c'
Command:
- '/opt/conda/bin/jupyter-lab --allow-root --ServerApp.token=${JUPYTER_TOKEN} --ServerApp.ip=*'
Secrets:
- Name: JUPYTER_TOKEN
ValueFrom: !Ref JupyterToken
PortMappings:
- ContainerPort: 8888
HostPort: 8888
MountPoints:
- SourceVolume: efs-volume
ContainerPath: /home
LogConfiguration:
LogDriver: 'awslogs'
Options:
mode: non-blocking
max-buffer-size: 25m
awslogs-group: !Ref LogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: !Ref ServiceName
LinuxParameters:
Devices:
# Ensure that the AWS Neuron SDK inside the container
# can access the underlying host device provided by AWS Inferentia
- ContainerPath: /dev/neuron0
HostPath: /dev/neuron0
Permissions:
- read
- write
Capabilities:
Add:
- "IPC_LOCK"
Volumes:
- Name: efs-volume
EFSVolumeConfiguration:
FilesystemId: !Ref EFSFileSystem
RootDirectory: /
TransitEncryption: ENABLED
# The secret token used to protect the Jupyter notebook
JupyterToken:
Type: AWS::SecretsManager::Secret
Properties:
GenerateSecretString:
PasswordLength: 30
ExcludePunctuation: true
# Attach a policy to the task execution role, which grants
# the ECS agent the ability to fetch the Jupyter notebook
# secret token on behalf of the task.
TaskAccessToJupyterToken:
Type: AWS::IAM::Policy
Properties:
Roles:
- !Ref ECSTaskExecutionRole
PolicyName: AccessJupyterToken
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- secretsmanager:DescribeSecret
- secretsmanager:GetSecretValue
Resource: !Ref JupyterToken
# Attach a policy to the task execution role which allows the
# task to access the EFS filesystem that provides durable storage
# for the task.
TaskAccessToFilesystem:
Type: AWS::IAM::Policy
Properties:
Roles:
- !Ref ECSTaskExecutionRole
PolicyName: EFSAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- elasticfilesystem:ClientMount
- elasticfilesystem:ClientWrite
- elasticfilesystem:DescribeMountTargets
- elasticfilesystem:DescribeFileSystems
Resource: !GetAtt EFSFileSystem.Arn
# The service. The service is a resource which allows you to run multiple
# copies of a type of task, and gather up their logs and metrics, as well
# as monitor the number of running tasks and replace any that have crashed
Service:
Type: AWS::ECS::Service
DependsOn: PublicLoadBalancerListener
Properties:
ServiceName: !Ref ServiceName
Cluster: !Ref ClusterName
PlacementStrategies:
- Field: attribute:ecs.availability-zone
Type: spread
- Field: cpu
Type: binpack
CapacityProviderStrategy:
- Base: 0
CapacityProvider: !Ref CapacityProvider
Weight: 1
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 75
DesiredCount: 1
NetworkConfiguration:
AwsvpcConfiguration:
SecurityGroups:
- !Ref ServiceSecurityGroup
Subnets:
- !Select [ 0, !Ref PrivateSubnetIds ]
- !Select [ 1, !Ref PrivateSubnetIds ]
TaskDefinition: !Ref TaskDefinition
LoadBalancers:
- ContainerName: jupyter
ContainerPort: 8888
TargetGroupArn: !Ref JupyterTargetGroup
# Because we are launching tasks in AWS VPC networking mode
# the tasks themselves also have an extra security group that is unique
# to them. This is a unique security group just for this service,
# to control which things it can talk to, and who can talk to it
ServiceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: !Sub Access to service ${ServiceName}
VpcId: !Ref VpcId
# This log group stores the stdout logs from this service's containers
LogGroup:
Type: AWS::Logs::LogGroup
# Keeps track of the list of tasks running on EC2 instances
JupyterTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
HealthCheckIntervalSeconds: 6
HealthCheckPath: /api
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
TargetType: ip
Port: 8888
Protocol: HTTP
UnhealthyThresholdCount: 10
VpcId: !Ref VpcId
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 0
# A public facing load balancer, this is used as ingress for
# public facing internet traffic. The traffic is forwarded
# down to the Juypter notebook where ever it is currently hosted
# on whichever machine Amazon ECS placed it on.
PublicLoadBalancerSG:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Access to the public facing load balancer
VpcId: !Ref VpcId
SecurityGroupIngress:
# Allow access to ALB from the specified IP address
- CidrIp: !Ref MyIp
IpProtocol: -1
PublicLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Scheme: internet-facing
LoadBalancerAttributes:
- Key: idle_timeout.timeout_seconds
Value: '30'
Subnets:
# The load balancer is placed into the public subnets, so that traffic
# from the internet can reach the load balancer directly via the internet gateway
- !Select [ 0, !Ref PublicSubnetIds ]
- !Select [ 1, !Ref PublicSubnetIds ]
SecurityGroups:
- !Ref PublicLoadBalancerSG
PublicLoadBalancerListener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
DefaultActions:
- Type: 'forward'
ForwardConfig:
TargetGroups:
- TargetGroupArn: !Ref JupyterTargetGroup
Weight: 100
LoadBalancerArn: !Ref 'PublicLoadBalancer'
Port: 80
Protocol: HTTP
# The Jupyter services' security group allows inbound
# traffic from the public facing ALB
JupyterIngressFromPublicALB:
Type: AWS::EC2::SecurityGroupIngress
Properties:
Description: Ingress from the public ALB
GroupId: !Ref 'ServiceSecurityGroup'
IpProtocol: -1
SourceSecurityGroupId: !Ref 'PublicLoadBalancerSG'
# Filesystem that provides durable storage for the notebook
EFSFileSystem:
Type: AWS::EFS::FileSystem
Properties:
Encrypted: true
PerformanceMode: generalPurpose
ThroughputMode: bursting
# Mount target allows usage of the EFS inside of subnet one
EFSMountTargetOne:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId: !Ref EFSFileSystem
SubnetId: !Select [ 0, !Ref PrivateSubnetIds ]
SecurityGroups:
- !Ref EFSFileSystemSecurityGroup
# Mount target allows usage of the EFS inside of subnet two
EFSMountTargetTwo:
Type: AWS::EFS::MountTarget
Properties:
FileSystemId: !Ref EFSFileSystem
SubnetId: !Select [ 1, !Ref PrivateSubnetIds ]
SecurityGroups:
- !Ref EFSFileSystemSecurityGroup
# This security group is used by the mount targets so
# that they will allow inbound NFS connections from
# the ECS tasks that we launch
EFSFileSystemSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for EFS file system
VpcId: !Ref VpcId
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 2049
ToPort: 2049
SourceSecurityGroupId: !Ref ServiceSecurityGroup
Outputs:
LoadBalancerUrl:
Description: The URL at which you can access the application
Value: !GetAtt PublicLoadBalancer.DNSName
Secret:
Description: The ARN of the secret that was created to protect the Juypter Lab
Value: !Ref JupyterToken
Some things to note:
You will need to pass the ImageUrl
parameter so that the stack launches the container image URI that you just uploaded to Amazon ECR. This will be handled later when we deploy the parent stack.
In the ContainerDefinitions[0].LinuxParameters
section you will see that the task definition is mounting the /dev/neuron0
device from the host into the container. This is what gives the Neuron SDK inside the container the ability to utilize the underlying hardware acceleration. Extremely large inf2
instances have multiple neuron*
devices that need to be mounted into the container.
The template generates an AWS::SecretsManager::Secret
resource as the secret token used to protect the Jupyter notebook from unauthorized access. You will see this token passed in as a Secret
in the task definition body.
The MyIp
parameter can be customized to limit which IP addresses are allowed to access the JupyterLab.
This task definition creates an EFS filesystem and mounts it to the path /home
. This can be used as durable persistence for models or other important info that you want to save from your Jupyter notebook. Otherwise everything else in this notebook will be wiped on restart because the container's filesystem is fundamentally ephemeral. However /home
directory will survive restarts. See the tutorial on attaching durable storage to an ECS task for more information on using EFS for durable task storage.
Deploy all the stacks
We can use the following parent stack to deploy all three child CloudFormation templates:
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: Parent stack that deploys VPC, Amazon ECS cluster with AWS Inferentia capacity
and then deploys a JupyterLab IDE (latest Jupyter Notebook) with AWS Neuron SDK for machine
learning projects
Parameters:
ImageUrl:
Type: String
Description: The URL of the Jupyter image that you built
Resources:
# The networking configuration. This creates an isolated
# network specific to this particular environment
VpcStack:
Type: AWS::Serverless::Application
Properties:
Location: vpc.yml
# This stack contains cluster wide resources that will be shared
# by all services that get launched in the stack
BaseStack:
Type: AWS::Serverless::Application
Properties:
Location: inferentia-cluster.yml
Parameters:
VpcId: !GetAtt VpcStack.Outputs.VpcId
SubnetIds: !GetAtt VpcStack.Outputs.PrivateSubnetIds
# Deploys the JupyterLab application into the cluster
JupyterNotebookStack:
Type: AWS::Serverless::Application
Properties:
Location: jupyter-notebook.yml
Parameters:
ImageUrl: !Ref ImageUrl
VpcId: !GetAtt VpcStack.Outputs.VpcId
PublicSubnetIds: !GetAtt VpcStack.Outputs.PublicSubnetIds
PrivateSubnetIds: !GetAtt VpcStack.Outputs.PrivateSubnetIds
ClusterName: !GetAtt BaseStack.Outputs.ClusterName
ECSTaskExecutionRole: !GetAtt BaseStack.Outputs.ECSTaskExecutionRole
CapacityProvider: !GetAtt BaseStack.Outputs.CapacityProvider
Outputs:
JuypterLabUrl:
Description: The URL at which you can find your Juypter Lab installation
Value: !GetAtt JupyterNotebookStack.Outputs.LoadBalancerUrl
Secret:
Description: The ARN of the secret token that protects your JuypterLab
Value: !GetAtt JupyterNotebookStack.Outputs.Secret
Use AWS SAM CLI to deploy the parent stack with a command like this one. You will need to substitute in your own ImageUrl
value from the container image that you built and pushed earlier:
sam deploy \
--template-file parent.yml \
--stack-name machine-learning-environment \
--resolve-s3 \
--capabilities CAPABILITY_IAM \
--parameter-overrides ImageUrl=209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest
After the deployment finishes you will see an output section that looks similar this:
-------------------------------------------------------------------------------------------------
Outputs
-------------------------------------------------------------------------------------------------
Key Secret
Description The ARN of the secret token that protects your JuypterLab
Value arn:aws:secretsmanager:us-east-2:209640446841:secret:JupyterToken-
kZ3MMCCAmjxn-VGGHTz
Key JuypterLabUrl
Description The URL at which you can find your Juypter Lab installation
Value jupyt-Publi-1U1OSUNR85E3J-297756869.us-east-2.elb.amazonaws.com
-------------------------------------------------------------------------------------------------
This tells you the URL where you can access your Juypter Lab notebook, as well as the details about where you can access to automatically generated secret value that is the token for accessing your notebook.
Access JupyterLab
Open up the AWS Secrets Manager console and look for the secret called JupyterToken
as referenced in the outputs section above. After you click on the secret, scroll down and click on "Retrieve Secret Value". Copy the secret value and keep it safe, as this will be the password that you use to get access to your JupyterLab over the internet.
If you wish to change this secret value in AWS Secrets Manager you will need to restart the Amazon ECS JuypterLab task for the change to take effect.
Open up the URL from the outputs section above, and enter the secret token when asked. When it opens you will see a screen similar to this:
At this point you can begin making use of the underlying AWS Inferentia hardware, via the JupyterLab IDE.
Make sure that acceleration is available
Inside of JupyterLab click on the "Other -> Terminal" option to open a tab that has a command line prompt. Any commands that you type in this prompt will run inside of the remote JupyterLab container.
Run the following command:
neuron-ls
You should see output similar to this:
+--------+--------+--------+---------+
| NEURON | NEURON | NEURON | PCI |
| DEVICE | CORES | MEMORY | BDF |
+--------+--------+--------+---------+
| 0 | 2 | 32 GB | 00:1f.0 |
+--------+--------+--------+---------+
This verifies that the AWS Neuron SDK inside of the container is able to connect to the AWS Neuron device, which provides the hardware acceleration of the underlying AWS Inferentia hardware. At this point you can begin to use the Neuron SDK to do machine learning tasks inside of the JupyterLab container.
You can also run the following command to open a hardware monitoring interface:
neuron-top
This will show more info about the Neuron hardware, including its current usage. Right now the Neuron cores are not in use, so let's change that by running a benchmark test:
Test out hardware acceleration
In JupyterLab start a new notebook. Run the following commands as cells in the notebook.
Install dependencies:
!python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
!pip install neuronx-cc==2.* tensorflow-neuronx ipywidgets transformers
Download a pretrained BERT model and compile it for the AWS Neuron device. This machine learning model is analyzing whether two input phrases that you have given it are paraphrases of each other:
import torch
import torch_neuronx
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import transformers
def encode(tokenizer, *inputs, max_length=128, batch_size=1):
tokens = tokenizer.encode_plus(
*inputs,
max_length=max_length,
padding='max_length',
truncation=True,
return_tensors="pt"
)
return (
torch.repeat_interleave(tokens['input_ids'], batch_size, 0),
torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),
torch.repeat_interleave(tokens['token_type_ids'], batch_size, 0),
)
# Create the tokenizer and model
name = "bert-base-cased-finetuned-mrpc"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)
# Set up some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"
paraphrase = encode(tokenizer, sequence_0, sequence_2)
not_paraphrase = encode(tokenizer, sequence_0, sequence_1)
# Run the original PyTorch BERT model on CPU
cpu_paraphrase_logits = model(*paraphrase)[0]
cpu_not_paraphrase_logits = model(*not_paraphrase)[0]
# Compile the model for Neuron
model_neuron = torch_neuronx.trace(model, paraphrase)
# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)
Now run the model on the AWS Neuron device, and compare with the results from running model on the CPU:
# Load the TorchScript compiled model
model_neuron = torch.jit.load(filename)
# Verify the TorchScript works on both example inputs
neuron_paraphrase_logits = model_neuron(*paraphrase)[0]
neuron_not_paraphrase_logits = model_neuron(*not_paraphrase)[0]
# Compare the results
print('CPU paraphrase logits: ', cpu_paraphrase_logits.detach().numpy())
print('Neuron paraphrase logits: ', neuron_paraphrase_logits.detach().numpy())
print('CPU not-paraphrase logits: ', cpu_not_paraphrase_logits.detach().numpy())
print('Neuron not-paraphrase logits: ', neuron_not_paraphrase_logits.detach().numpy())
You should see output similar to this:
CPU paraphrase logits: [[-0.34945598 1.9003887 ]]
Neuron paraphrase logits: [[-0.34909704 1.8992746 ]]
CPU not-paraphrase logits: [[ 0.5386365 -2.2197142]]
Neuron not-paraphrase logits: [[ 0.537705 -2.2180324]]
Whether you do model inference on the CPU or the AWS Neuron device it should produce very similar results, however model inference with Neuron was offloaded onto the underlying Inferentia accelerator, leaving the rest of the EC2 instances resources free for other tasks.
Run the model in a loop as a benchmark to test out performance on the underlying hardware:
import time
import concurrent.futures
import numpy as np
def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=10000):
"""
Record performance statistics for a serialized model and its input example.
Arguments:
filename: The serialized torchscript model to load for benchmarking.
example: An example model input.
n_models: The number of models to load.
n_threads: The number of simultaneous threads to execute inferences on.
batches_per_thread: The number of example batches to run per thread.
Returns:
A dictionary of performance statistics.
"""
# Load models
models = [torch.jit.load(filename) for _ in range(n_models)]
# Warmup
for _ in range(8):
for model in models:
model(*example)
latencies = []
# Thread task
def task(model):
for _ in range(batches_per_thread):
start = time.time()
model(*example)
finish = time.time()
latencies.append((finish - start) * 1000)
# Submit tasks
begin = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:
for i in range(n_threads):
pool.submit(task, models[i % len(models)])
end = time.time()
# Compute metrics
boundaries = [50, 95, 99]
percentiles = {}
for boundary in boundaries:
name = f'latency_p{boundary}'
percentiles[name] = np.percentile(latencies, boundary)
duration = end - begin
batch_size = 0
for tensor in example:
if batch_size == 0:
batch_size = tensor.shape[0]
inferences = len(latencies) * batch_size
throughput = inferences / duration
# Metrics
metrics = {
'filename': str(filename),
'batch_size': batch_size,
'batches': len(latencies),
'inferences': inferences,
'threads': n_threads,
'models': n_models,
'duration': duration,
'throughput': throughput,
**percentiles,
}
display(metrics)
def display(metrics):
"""
Display the metrics produced by `benchmark` function.
Args:
metrics: A dictionary of performance statistics.
"""
pad = max(map(len, metrics)) + 1
for key, value in metrics.items():
parts = key.split('_')
parts = list(map(str.title, parts))
title = ' '.join(parts) + ":"
if isinstance(value, float):
value = f'{value:0.3f}'
print(f'{title :<{pad}} {value}')
# Benchmark BERT on Neuron
benchmark(filename, paraphrase)
While this notebook code runs you can check neuron-top
and you should see output similar to this:
You can see that Neuron cores are in use as the benchmark runs the pretrained BERT model, however the CPU has very little utilization. This is exactly what you want to see: the machine learning inference workload has been almost fully offloaded onto AWS Inferentia hardware.
The benchmark output should look similar to this:
Filename: model.pt
Batch Size: 1
Batches: 20000
Inferences: 20000
Threads: 2
Models: 2
Duration: 9.944
Throughput: 2011.203
Latency P50: 0.994
Latency P95: 1.017
Latency P99: 1.045
The model has been run 20k times in under 10 seconds, with a p99 latency of ~1ms. As you can see the AWS Inferentia hardware acceleration is ideal for realtime inference applications such as doing inference on demand in response to a web request.
Next Steps
- Look at the
jupyter-notebook.yml
stack, and notice theMyIp
parameter. It is currently set to0.0.0.0/0
which allows inbound traffic from all IP addresses. Look up your home or office IP address and setMyIp
to a CIDR like1.2.3.4/32
to ensure that the load balancer in front of JupyterLab only accepts inbound traffic from you and you alone. This adds a second layer of network protection in addition to the secret token. - Right now if you restart the ECS task it will wipe any changes that you made to the container's ephemeral filesystem. You may not wish to wipe installed Python packages though. Consider setting up a Python virtual environment that lives inside of the
/home
directory, since this directory is an Elastic File System that provides durable persistance for the container. - Instead of running the model inside of JupyterLab consider creating a model server that does inference in reponse to a network request, and returns the results over the network. Now you can horizontally scale the workload across multiple Inferentia instances behind a load balancer, allowing you to do extremely high volume real time inference machine learning at low latency.
- If you launch even larger Inferentia instances like
inf2.24xlarge
orinf2.48xlarge
then you should note that they have multiple Neuron devices attached to them. You can runls /dev/neuron*
on the EC2 instance to see a list of the Neuron devices. Right now the task definition only mounts/dev/neuron0
so you will only be able to access two Neuron cores inside the task. For larger Inferentia instances you should update the ECS task definition to mount all of the available host Neuron devices into the container.