Deploy Jupyter notebook container with Amazon ECS

Nathan Peck profile picture
Nathan Peck
Senior Developer Advocate at AWS

About

Jupyter Notebook is a web-based interactive computing platform. It is popular for machine learning and as an IDE for developing in multiple programming languages. JupyterLab is the latest version of Juypter notebook, with a more IDE like experience, and modular, extendable design.

AWS Inferentia accelerators deliver high performance at the lowest cost for your deep learning (DL) inference applications. When training your models use AWS Trainium Instances, which are optimized for model training.

AWS Neuron is an SDK which runs your machine learning models on the underlying hardware acceleration of AWS Inferentia or AWS Trainium.

This pattern will show how to build and deploy a containerized version of Jupyter notebook, with the AWS Neuron SDK for machine learning, accelerated by AWS Inferentia and AWS Trainium hardware.

WARNING

This pattern is designed to setup a production ready machine learning environment that can be scaled up later for running extremely large machine learning training or real time inference jobs accelerated by some of the most powerful hardware that AWS has. Therefore this pattern has a fairly high baseline cost (about $2 an hour, and the largest instance choice available costs >$12 an hour). Consider using Amazon Sagemaker Notebooks on smaller EC2 instances for a low cost learning environment that is free tier eligible.

Setup

If not already installed, ensure that you have the following dependencies installed locally:

  • Docker or other OCI compatible container builder. This will be used to prepare a custom JupyterLab image.
  • Amazon ECR Credential Helper. This will assist you with uploading your container image to Amazon Elastic Container Registry.
  • AWS SAM CLI. This tool will help you deploy multiple CloudFormation stacks at once and pass values from one stack to the next automatically.

Architecture

The following diagram shows the architecture of what will be deployed:

Containerinf2.8xlarge EC2 InstanceAWS Inferentia2 acceleratorJupyterAWS NeuronApplication Load BalancerAWS Secrets ManagerYour browser

  1. An Application Load Balancer provides ingress from the public internet.
  2. Traffic goes to an inf2.8xlarge AWS Inferentia powered EC2 instance launched by Amazon ECS. You can adjust the AWS Inferentia instance class as desired.
  3. Amazon ECS has placed a container task on the instance, which hosts JupyterLab and the AWS Neuron SDK.
  4. Amazon ECS has connected the container to the underlying Neuron device provided by the AWS Inferentia instance.
  5. Machine learning workloads that you run inside the container are able to connect to the hardware accelerator.
  6. Access to the JupyterLab notebook is protected by a secret token that is stored in AWS Secrets Manager. Amazon ECS manages retrieving this secret value and injecting it into the Jupyter server on container startup.

Build a Jupyter notebook container

In order to build a Jupyter notebook container image we will start with a prebuilt container image from the AWS Deep Learning Container collection, then install JupyterLab on top of it:

Language: Dockerfile
FROM 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.0-neuronx-py38-sdk2.9.0-ubuntu20.04
RUN pip install jupyterlab
CMD jupyter-lab

Create the Dockerfile and then build the custom image locally:

Language: shell
docker build -t jupyter-notebook .

Now we need to create an Amazon Elastic Container Registry:

Language: shell
aws ecr create-repository --repository-name jupyter-notebook

You should get a response similar to this:

Language: json
{
  "repository": {
      "repositoryUri": "209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook",
      "imageScanningConfiguration": {
          "scanOnPush": false
      },
      "encryptionConfiguration": {
          "encryptionType": "AES256"
      },
      "registryId": "209640446841",
      "imageTagMutability": "MUTABLE",
      "repositoryArn": "arn:aws:ecr:us-east-2:209640446841:repository/jupyter-notebook",
      "repositoryName": "jupyter-notebook",
      "createdAt": 1683047667.0
  }
}

Copy the repositoryUri as this is how you will interact with the repository. Use similar commands to tag your built image and then push it to Amazon ECR:

Language: shell
docker tag jupyter-notebook 209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest
docker push 209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest

INFO

If you get a 401 Unauthorized error then make sure you have installed the Amazon ECR credential helper properly. It will automatically use your current AWS credentials to authenticate with the ECR repository on the fly.

Define a VPC for the workload

The following CloudFormation file defines a VPC for the workload.

File: vpc.ymlLanguage: yml
AWSTemplateFormatVersion: '2010-09-09'
Description: This stack deploys a large AWS VPC with internet access
Mappings:
  # Hard values for the subnet masks. These masks define
  # the range of internal IP addresses that can be assigned.
  # The VPC can have all IP's from 10.0.0.0 to 10.0.255.255
  # There are four subnets which cover the ranges:
  #
  # 10.0.0.0 - 10.0.63.255 (16384 IP addresses)
  # 10.0.64.0 - 10.0.127.255 (16384 IP addresses)
  # 10.0.128.0 - 10.0.191.255 (16384 IP addresses)
  # 10.0.192.0 - 10.0.255.0 (16384 IP addresses)
  #
  SubnetConfig:
    VPC:
      CIDR: '10.0.0.0/16'
    PublicOne:
      CIDR: '10.0.0.0/18'
    PublicTwo:
      CIDR: '10.0.64.0/18'
    PrivateOne:
      CIDR: '10.0.128.0/18'
    PrivateTwo:
      CIDR: '10.0.192.0/18'
Resources:
  # VPC in which containers will be networked.
  # It has two public subnets, and two private subnets.
  # We distribute the subnets across the first two available subnets
  # for the region, for high availability.
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      EnableDnsSupport: true
      EnableDnsHostnames: true
      CidrBlock: !FindInMap ['SubnetConfig', 'VPC', 'CIDR']

  # Two public subnets, where containers can have public IP addresses
  PublicSubnetOne:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
         Fn::Select:
         - 0
         - Fn::GetAZs: {Ref: 'AWS::Region'}
      VpcId: !Ref 'VPC'
      CidrBlock: !FindInMap ['SubnetConfig', 'PublicOne', 'CIDR']
      MapPublicIpOnLaunch: true
  PublicSubnetTwo:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
         Fn::Select:
         - 1
         - Fn::GetAZs: {Ref: 'AWS::Region'}
      VpcId: !Ref 'VPC'
      CidrBlock: !FindInMap ['SubnetConfig', 'PublicTwo', 'CIDR']
      MapPublicIpOnLaunch: true

  # Two private subnets where containers will only have private
  # IP addresses, and will only be reachable by other members of the
  # VPC
  PrivateSubnetOne:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
         Fn::Select:
         - 0
         - Fn::GetAZs: {Ref: 'AWS::Region'}
      VpcId: !Ref 'VPC'
      CidrBlock: !FindInMap ['SubnetConfig', 'PrivateOne', 'CIDR']
  PrivateSubnetTwo:
    Type: AWS::EC2::Subnet
    Properties:
      AvailabilityZone:
         Fn::Select:
         - 1
         - Fn::GetAZs: {Ref: 'AWS::Region'}
      VpcId: !Ref 'VPC'
      CidrBlock: !FindInMap ['SubnetConfig', 'PrivateTwo', 'CIDR']

  # Setup networking resources for the public subnets. Containers
  # in the public subnets have public IP addresses and the routing table
  # sends network traffic via the internet gateway.
  InternetGateway:
    Type: AWS::EC2::InternetGateway
  GatewayAttachement:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref 'VPC'
      InternetGatewayId: !Ref 'InternetGateway'
  PublicRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref 'VPC'
  PublicRoute:
    Type: AWS::EC2::Route
    DependsOn: GatewayAttachement
    Properties:
      RouteTableId: !Ref 'PublicRouteTable'
      DestinationCidrBlock: '0.0.0.0/0'
      GatewayId: !Ref 'InternetGateway'
  PublicSubnetOneRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetOne
      RouteTableId: !Ref PublicRouteTable
  PublicSubnetTwoRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref PublicSubnetTwo
      RouteTableId: !Ref PublicRouteTable

  # Setup networking resources for the private subnets. Containers
  # in these subnets have only private IP addresses, and must use a NAT
  # gateway to talk to the internet. We launch two NAT gateways, one for
  # each private subnet.
  NatGatewayOneAttachment:
    Type: AWS::EC2::EIP
    DependsOn: GatewayAttachement
    Properties:
        Domain: vpc
  NatGatewayTwoAttachment:
    Type: AWS::EC2::EIP
    DependsOn: GatewayAttachement
    Properties:
        Domain: vpc
  NatGatewayOne:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt NatGatewayOneAttachment.AllocationId
      SubnetId: !Ref PublicSubnetOne
  NatGatewayTwo:
    Type: AWS::EC2::NatGateway
    Properties:
      AllocationId: !GetAtt NatGatewayTwoAttachment.AllocationId
      SubnetId: !Ref PublicSubnetTwo
  PrivateRouteTableOne:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref 'VPC'
  PrivateRouteOne:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTableOne
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGatewayOne
  PrivateRouteTableOneAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref PrivateRouteTableOne
      SubnetId: !Ref PrivateSubnetOne
  PrivateRouteTableTwo:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref 'VPC'
  PrivateRouteTwo:
    Type: AWS::EC2::Route
    Properties:
      RouteTableId: !Ref PrivateRouteTableTwo
      DestinationCidrBlock: 0.0.0.0/0
      NatGatewayId: !Ref NatGatewayTwo
  PrivateRouteTableTwoAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      RouteTableId: !Ref PrivateRouteTableTwo
      SubnetId: !Ref PrivateSubnetTwo

Outputs:
  VpcId:
    Description: The ID of the VPC that this stack is deployed in
    Value: !Ref 'VPC'
  PublicSubnetIds:
    Description: Comma seperated list of public facing subnets that have
                 a direct internet connection as long as you assign a public IP
    Value: !Sub '${PublicSubnetOne},${PublicSubnetTwo}'
  PrivateSubnetIds:
    Description: Comma seperated list of private subnets that use a NAT
                 gateway for internet access.
    Value: !Sub '${PrivateSubnetOne},${PrivateSubnetTwo}'

For more info about this VPC see the pattern "Large sized AWS VPC for an Amazon ECS cluster".

Define Amazon ECS cluster of AWS Inferentia instances

The following CloudFormation file defines an Amazon ECS cluster that launches AWS Inferentia instances as capacity for running containers. These instances have hardware acceleration that is optimized for running machine learning inference jobs.

File: inferentia-cluster.ymlLanguage: yml
AWSTemplateFormatVersion: '2010-09-09'
Description: ECS Cluster optimized for machine learning inference workloads
Parameters:
  InstanceType:
    Type: String
    Default: inf2.8xlarge
    Description: Class of AWS Inferentia instance used to run containers.
    AllowedValues: [inf1.xlarge, inf1.2xlarge, inf1.6xlarge, inf1.24xlarge,
      inf2.xlarge, inf2.8xlarge, inf2.24xlarge, inf2.48xlarge]
    ConstraintDescription: Please choose a valid instance type.
  DesiredCapacity:
    Type: Number
    Default: '0'
    Description: Number of instances to initially launch in your ECS cluster.
  MaxSize:
    Type: Number
    Default: '2'
    Description: Maximum number of instances that can be launched in your ECS cluster.
  ECSAMI:
    Type: AWS::SSM::Parameter::Value<AWS::EC2::Image::Id>
    Default: /aws/service/ecs/optimized-ami/amazon-linux-2/inf/recommended/image_id
    Description: The Amazon Machine Image ID used for the cluster, leave it as the default value to get the latest Inferentia optimized AMI
  VpcId:
    Type: AWS::EC2::VPC::Id
    Description: VPC ID where the ECS cluster is launched
  SubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of subnet IDs where the EC2 instances will be launched

Resources:
  # Cluster that keeps track of container deployments
  ECSCluster:
    Type: AWS::ECS::Cluster
    Properties:
      ClusterSettings:
        - Name: containerInsights
          Value: enabled

  # Autoscaling group. This launches the actual EC2 instances that will register
  # themselves as members of the cluster, and run the docker containers.
  ECSAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn:
      # This is to ensure that the ASG gets deleted first before these
      # resources, when it comes to stack teardown.
      - ECSCluster
      - EC2Role
    Properties:
      VPCZoneIdentifier:
        - !Select [ 0, !Ref SubnetIds ]
        - !Select [ 1, !Ref SubnetIds ]
      LaunchTemplate:
        LaunchTemplateId: !Ref ContainerInstances
        Version: !GetAtt ContainerInstances.LatestVersionNumber
      MinSize: 0
      MaxSize: !Ref MaxSize
      DesiredCapacity: !Ref DesiredCapacity
      NewInstancesProtectedFromScaleIn: true
    UpdatePolicy:
      AutoScalingReplacingUpdate:
        WillReplace: 'true'

  # The config for each instance that is added to the cluster
  ContainerInstances:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        ImageId: !Ref ECSAMI
        InstanceType: !Ref InstanceType
        IamInstanceProfile:
          Name: !Ref EC2InstanceProfile
        SecurityGroupIds:
          - !Ref ContainerHostSecurityGroup
        UserData:
          # This injected configuration file is how the EC2 instance
          # knows which ECS cluster on your AWS account it should be joining
          Fn::Base64: !Sub |
            #!/bin/bash
            echo ECS_CLUSTER=${ECSCluster} >> /etc/ecs/ecs.config
        BlockDeviceMappings:
          - DeviceName: "/dev/xvda"
            Ebs:
              VolumeSize: 50
              VolumeType: gp3
        # Disable IMDSv1, and require IMDSv2
        MetadataOptions:
          HttpEndpoint: enabled
          HttpTokens: required
  EC2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Path: /
      Roles:
        - !Ref EC2Role

  # Custom resource that force destroys the ASG. This cleans up EC2 instances that had
  # managed termination protection enabled, but which are not yet released.
  # This is necessary because ECS does not immediately release an EC2 instance from termination
  # protection as soon as the instance is no longer running tasks. There is a cooldown delay.
  # In the case of tearing down the CloudFormation stack, CloudFormation will delete the
  # AWS::ECS::Service and immediately move on to tearing down the AWS::ECS::Cluster, disconnecting
  # the AWS::AutoScaling::AutoScalingGroup from ECS management too fast, before ECS has a chance
  # to asynchronously turn off managed instance protection on the EC2 instances.
  # This will leave some EC2 instances stranded in a state where they are protected from scale-in forever.
  # This then blocks the AWS::AutoScaling::AutoScalingGroup from cleaning itself up.
  # The custom resource function force destroys the autoscaling group when tearing down the stack,
  # avoiding the issue of protected EC2 instances that can never be cleaned up.
  CustomAsgDestroyerFunction:
    Type: AWS::Lambda::Function
    Properties:
      Code:
        ZipFile: !Sub |
          const { AutoScalingClient, DeleteAutoScalingGroupCommand } = require("@aws-sdk/client-auto-scaling");
          const autoscaling = new AutoScalingClient({ region: '${AWS::Region}' });
          const response = require('cfn-response');

          exports.handler = async function(event, context) {
            console.log(event);

            if (event.RequestType !== "Delete") {
              await response.send(event, context, response.SUCCESS);
              return;
            }

            const input = {
              AutoScalingGroupName: '${ECSAutoScalingGroup}',
              ForceDelete: true
            };
            const command = new DeleteAutoScalingGroupCommand(input);
            const deleteResponse = await autoscaling.send(command);
            console.log(deleteResponse);

            await response.send(event, context, response.SUCCESS);
          };
      Handler: index.handler
      Runtime: nodejs20.x
      Timeout: 30
      Role: !GetAtt CustomAsgDestroyerRole.Arn

  # The role used by the ASG destroyer
  CustomAsgDestroyerRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - lambda.amazonaws.com
            Action:
              - sts:AssumeRole
      ManagedPolicyArns:
        # https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AWSLambdaBasicExecutionRole.html
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      Policies:
        - PolicyName: allow-to-delete-autoscaling-group
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action: autoscaling:DeleteAutoScalingGroup
                Resource: !Sub arn:aws:autoscaling:${AWS::Region}:${AWS::AccountId}:autoScalingGroup:*:autoScalingGroupName/${ECSAutoScalingGroup}

  CustomAsgDestroyer:
    Type: Custom::AsgDestroyer
    DependsOn:
      - CapacityProviderAssociation
    Properties:
      ServiceToken: !GetAtt CustomAsgDestroyerFunction.Arn
      Region: !Ref "AWS::Region"

  # Create an ECS capacity provider to attach the ASG to the ECS cluster
  # so that it autoscales as we launch more containers
  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref ECSAutoScalingGroup
        ManagedScaling:
          InstanceWarmupPeriod: 60
          MinimumScalingStepSize: 1
          MaximumScalingStepSize: 100
          Status: ENABLED
          # Percentage of cluster reservation to try to maintain
          TargetCapacity: 100
        ManagedTerminationProtection: ENABLED
        ManagedDraining: ENABLED

  # Create a cluster capacity provider assocation so that the cluster
  # will use the capacity provider
  CapacityProviderAssociation:
    Type: AWS::ECS::ClusterCapacityProviderAssociations
    Properties:
      CapacityProviders:
        - !Ref CapacityProvider
      Cluster: !Ref ECSCluster
      DefaultCapacityProviderStrategy:
        - Base: 0
          CapacityProvider: !Ref CapacityProvider
          Weight: 1

  # A security group for the EC2 hosts that will run the containers.
  # This can be used to limit incoming traffic to or outgoing traffic
  # from the container's host EC2 instance.
  ContainerHostSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the EC2 hosts that run containers
      VpcId: !Ref VpcId

  # Role for the EC2 hosts. This allows the ECS agent on the EC2 hosts
  # to communciate with the ECS control plane, as well as download the docker
  # images from ECR to run on your host.
  EC2Role:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: [ec2.amazonaws.com]
            Action: ['sts:AssumeRole']
      Path: /

      # See reference: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonEC2ContainerServiceforEC2Role
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role

  # This is a role which is used within Fargate to allow the Fargate agent
  # to download images, and upload logs.
  ECSTaskExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
          - Effect: Allow
            Principal:
              Service: [ecs-tasks.amazonaws.com]
            Action: ['sts:AssumeRole']
            Condition:
              ArnLike:
                aws:SourceArn: !Sub arn:aws:ecs:${AWS::Region}:${AWS::AccountId}:*
              StringEquals:
                aws:SourceAccount: !Ref AWS::AccountId
      Path: /

      # This role enables basic features of ECS. See reference:
      # https://docs.aws.amazon.com/AmazonECS/latest/developerguide/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonECSTaskExecutionRolePolicy
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Outputs:
  ClusterName:
    Description: The ECS cluster into which to launch resources
    Value: !Ref ECSCluster
  ECSTaskExecutionRole:
    Description: The role used to start up a task
    Value: !Ref ECSTaskExecutionRole
  CapacityProvider:
    Description: The cluster capacity provider that the service should use
                 to request capacity when it wants to start up a task
    Value: !Ref CapacityProvider

By default this template deploys inf2.xlarge instances. You can launch additional tasks in the Amazon ECS cluster to automatically scale out the number of AWS Inferentia instances. If you plan to run containers that do not need machine learning acceleration, then do not use this pattern, and instead deploy a cluster that uses a less expensive EC2 instance that is compute optimized instead of machine learning optimized.

Define the Juypter notebook task

The following CloudFormation template deploys a Jupyter Notebook task under Amazon ECS orchestration:

File: jupyter-notebook.ymlLanguage: yml
AWSTemplateFormatVersion: '2010-09-09'
Description: An example service that deploys an Jupyter Lab notebook
             with AWS Neuron support for machine learning.

Parameters:
  VpcId:
    Type: String
    Description: The VPC that the service is running inside of
  PublicSubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of public facing subnets
  PrivateSubnetIds:
    Type: List<AWS::EC2::Subnet::Id>
    Description: List of private subnets
  ClusterName:
    Type: String
    Description: The name of the ECS cluster into which to launch capacity.
  ECSTaskExecutionRole:
    Type: String
    Description: The role used to start up an ECS task
  CapacityProvider:
    Type: String
    Description: The cluster capacity provider that the service should use
                 to request capacity when it wants to start up a task
  ServiceName:
    Type: String
    Default: jupyter
    Description: A name for the service
  ImageUrl:
    Type: String
    Description: The URL of a Juypter notebook container to run
  ContainerCpu:
    Type: Number
    Default: 10240
    Description: How much CPU to give the container. 1024 is 1 CPU
  ContainerMemory:
    Type: Number
    Default: 32768
    Description: How much memory in megabytes to give the container
  MyIp:
    Type: String
    Default: 0.0.0.0/0
    Description: The IP addresses that you want to accept traffic from.
                 Default accepts traffic from anywhere on the internet.

Resources:

  # The task definition. This is a simple metadata description of what
  # container to run, and what resource requirements it has.
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    DependsOn:
      - TaskAccessToJupyterToken
    Properties:
      Family: !Ref ServiceName
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - EC2
      ExecutionRoleArn: !Ref ECSTaskExecutionRole
      ContainerDefinitions:
        - Name: jupyter
          Cpu: !Ref ContainerCpu
          MemoryReservation: !Ref ContainerMemory
          Image: !Ref ImageUrl
          EntryPoint:
            - '/bin/sh'
            - '-c'
          Command:
            - '/opt/conda/bin/jupyter-lab --allow-root --ServerApp.token=${JUPYTER_TOKEN} --ServerApp.ip=*'
          Secrets:
            - Name: JUPYTER_TOKEN
              ValueFrom: !Ref JupyterToken
          PortMappings:
            - ContainerPort: 8888
              HostPort: 8888
          MountPoints:
            - SourceVolume: efs-volume
              ContainerPath: /home
          LogConfiguration:
            LogDriver: 'awslogs'
            Options:
              mode: non-blocking
              max-buffer-size: 25m
              awslogs-group: !Ref LogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: !Ref ServiceName
          LinuxParameters:
            Devices:
              # Ensure that the AWS Neuron SDK inside the container
              # can access the underlying host device provided by AWS Inferentia
              - ContainerPath: /dev/neuron0
                HostPath: /dev/neuron0
                Permissions:
                  - read
                  - write
            Capabilities:
              Add:
                - "IPC_LOCK"
      Volumes:
        - Name: efs-volume
          EFSVolumeConfiguration:
            FilesystemId: !Ref EFSFileSystem
            RootDirectory: /
            TransitEncryption: ENABLED

  # The secret token used to protect the Jupyter notebook
  JupyterToken:
    Type: AWS::SecretsManager::Secret
    Properties:
      GenerateSecretString:
        PasswordLength: 30
        ExcludePunctuation: true

  # Attach a policy to the task execution role, which grants
  # the ECS agent the ability to fetch the Jupyter notebook
  # secret token on behalf of the task.
  TaskAccessToJupyterToken:
    Type: AWS::IAM::Policy
    Properties:
      Roles:
        - !Ref ECSTaskExecutionRole
      PolicyName: AccessJupyterToken
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - secretsmanager:DescribeSecret
              - secretsmanager:GetSecretValue
            Resource: !Ref JupyterToken

  # Attach a policy to the task execution role which allows the
  # task to access the EFS filesystem that provides durable storage
  # for the task.
  TaskAccessToFilesystem:
    Type: AWS::IAM::Policy
    Properties:
      Roles:
        - !Ref ECSTaskExecutionRole
      PolicyName: EFSAccess
      PolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Action:
              - elasticfilesystem:ClientMount
              - elasticfilesystem:ClientWrite
              - elasticfilesystem:DescribeMountTargets
              - elasticfilesystem:DescribeFileSystems
            Resource: !GetAtt EFSFileSystem.Arn

  # The service. The service is a resource which allows you to run multiple
  # copies of a type of task, and gather up their logs and metrics, as well
  # as monitor the number of running tasks and replace any that have crashed
  Service:
    Type: AWS::ECS::Service
    DependsOn: PublicLoadBalancerListener
    Properties:
      ServiceName: !Ref ServiceName
      Cluster: !Ref ClusterName
      PlacementStrategies:
        - Field: attribute:ecs.availability-zone
          Type: spread
        - Field: cpu
          Type: binpack
      CapacityProviderStrategy:
        - Base: 0
          CapacityProvider: !Ref CapacityProvider
          Weight: 1
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 75
      DesiredCount: 1
      NetworkConfiguration:
        AwsvpcConfiguration:
          SecurityGroups:
            - !Ref ServiceSecurityGroup
          Subnets:
            - !Select [ 0, !Ref PrivateSubnetIds ]
            - !Select [ 1, !Ref PrivateSubnetIds ]
      TaskDefinition: !Ref TaskDefinition
      LoadBalancers:
        - ContainerName: jupyter
          ContainerPort: 8888
          TargetGroupArn: !Ref JupyterTargetGroup

  # Because we are launching tasks in AWS VPC networking mode
  # the tasks themselves also have an extra security group that is unique
  # to them. This is a unique security group just for this service,
  # to control which things it can talk to, and who can talk to it
  ServiceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: !Sub Access to service ${ServiceName}
      VpcId: !Ref VpcId

  # This log group stores the stdout logs from this service's containers
  LogGroup:
    Type: AWS::Logs::LogGroup

   # Keeps track of the list of tasks running on EC2 instances
  JupyterTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 6
      HealthCheckPath: /api
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 5
      HealthyThresholdCount: 2
      TargetType: ip
      Port: 8888
      Protocol: HTTP
      UnhealthyThresholdCount: 10
      VpcId: !Ref VpcId
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 0

  # A public facing load balancer, this is used as ingress for
  # public facing internet traffic. The traffic is forwarded
  # down to the Juypter notebook where ever it is currently hosted
  # on whichever machine Amazon ECS placed it on.
  PublicLoadBalancerSG:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Access to the public facing load balancer
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        # Allow access to ALB from the specified IP address
        - CidrIp: !Ref MyIp
          IpProtocol: -1
  PublicLoadBalancer:
    Type: AWS::ElasticLoadBalancingV2::LoadBalancer
    Properties:
      Scheme: internet-facing
      LoadBalancerAttributes:
      - Key: idle_timeout.timeout_seconds
        Value: '30'
      Subnets:
        # The load balancer is placed into the public subnets, so that traffic
        # from the internet can reach the load balancer directly via the internet gateway
        - !Select [ 0, !Ref PublicSubnetIds ]
        - !Select [ 1, !Ref PublicSubnetIds ]
      SecurityGroups:
        - !Ref PublicLoadBalancerSG
  PublicLoadBalancerListener:
    Type: AWS::ElasticLoadBalancingV2::Listener
    Properties:
      DefaultActions:
        - Type: 'forward'
          ForwardConfig:
            TargetGroups:
              - TargetGroupArn: !Ref JupyterTargetGroup
                Weight: 100
      LoadBalancerArn: !Ref 'PublicLoadBalancer'
      Port: 80
      Protocol: HTTP

  # The Jupyter services' security group allows inbound
  # traffic from the public facing ALB
  JupyterIngressFromPublicALB:
    Type: AWS::EC2::SecurityGroupIngress
    Properties:
      Description: Ingress from the public ALB
      GroupId: !Ref 'ServiceSecurityGroup'
      IpProtocol: -1
      SourceSecurityGroupId: !Ref 'PublicLoadBalancerSG'

  # Filesystem that provides durable storage for the notebook
  EFSFileSystem:
    Type: AWS::EFS::FileSystem
    Properties:
      Encrypted: true
      PerformanceMode: generalPurpose
      ThroughputMode: bursting

  # Mount target allows usage of the EFS inside of subnet one
  EFSMountTargetOne:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref EFSFileSystem
      SubnetId: !Select [ 0, !Ref PrivateSubnetIds ]
      SecurityGroups:
        - !Ref EFSFileSystemSecurityGroup

  # Mount target allows usage of the EFS inside of subnet two
  EFSMountTargetTwo:
    Type: AWS::EFS::MountTarget
    Properties:
      FileSystemId: !Ref EFSFileSystem
      SubnetId: !Select [ 1, !Ref PrivateSubnetIds ]
      SecurityGroups:
        - !Ref EFSFileSystemSecurityGroup

  # This security group is used by the mount targets so
  # that they will allow inbound NFS connections from
  # the ECS tasks that we launch
  EFSFileSystemSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: Security group for EFS file system
      VpcId: !Ref VpcId
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 2049
          ToPort: 2049
          SourceSecurityGroupId: !Ref ServiceSecurityGroup

Outputs:
  LoadBalancerUrl:
    Description: The URL at which you can access the application
    Value: !GetAtt PublicLoadBalancer.DNSName
  Secret:
    Description: The ARN of the secret that was created to protect the Juypter Lab
    Value: !Ref JupyterToken

Some things to note:

You will need to pass the ImageUrl parameter so that the stack launches the container image URI that you just uploaded to Amazon ECR. This will be handled later when we deploy the parent stack.

In the ContainerDefinitions[0].LinuxParameters section you will see that the task definition is mounting the /dev/neuron0 device from the host into the container. This is what gives the Neuron SDK inside the container the ability to utilize the underlying hardware acceleration. Extremely large inf2 instances have multiple neuron* devices that need to be mounted into the container.

The template generates an AWS::SecretsManager::Secret resource as the secret token used to protect the Jupyter notebook from unauthorized access. You will see this token passed in as a Secret in the task definition body.

The MyIp parameter can be customized to limit which IP addresses are allowed to access the JupyterLab.

This task definition creates an EFS filesystem and mounts it to the path /home. This can be used as durable persistence for models or other important info that you want to save from your Jupyter notebook. Otherwise everything else in this notebook will be wiped on restart because the container's filesystem is fundamentally ephemeral. However /home directory will survive restarts. See the tutorial on attaching durable storage to an ECS task for more information on using EFS for durable task storage.

Deploy all the stacks

We can use the following parent stack to deploy all three child CloudFormation templates:

File: parent.ymlLanguage: yml
AWSTemplateFormatVersion: "2010-09-09"
Transform: AWS::Serverless-2016-10-31
Description: Parent stack that deploys VPC, Amazon ECS cluster with AWS Inferentia capacity
             and then deploys a JupyterLab IDE (latest Jupyter Notebook) with AWS Neuron SDK for machine
             learning projects

Parameters:
  ImageUrl:
    Type: String
    Description: The URL of the Jupyter image that you built

Resources:

  # The networking configuration. This creates an isolated
  # network specific to this particular environment
  VpcStack:
    Type: AWS::Serverless::Application
    Properties:
      Location: vpc.yml

  # This stack contains cluster wide resources that will be shared
  # by all services that get launched in the stack
  BaseStack:
    Type: AWS::Serverless::Application
    Properties:
      Location: inferentia-cluster.yml
      Parameters:
        VpcId: !GetAtt VpcStack.Outputs.VpcId
        SubnetIds: !GetAtt VpcStack.Outputs.PrivateSubnetIds

  # Deploys the JupyterLab application into the cluster
  JupyterNotebookStack:
    Type: AWS::Serverless::Application
    Properties:
      Location: jupyter-notebook.yml
      Parameters:
        ImageUrl: !Ref ImageUrl
        VpcId: !GetAtt VpcStack.Outputs.VpcId
        PublicSubnetIds: !GetAtt VpcStack.Outputs.PublicSubnetIds
        PrivateSubnetIds: !GetAtt VpcStack.Outputs.PrivateSubnetIds
        ClusterName: !GetAtt BaseStack.Outputs.ClusterName
        ECSTaskExecutionRole: !GetAtt BaseStack.Outputs.ECSTaskExecutionRole
        CapacityProvider: !GetAtt BaseStack.Outputs.CapacityProvider

Outputs:
  JuypterLabUrl:
    Description: The URL at which you can find your Juypter Lab installation
    Value: !GetAtt JupyterNotebookStack.Outputs.LoadBalancerUrl
  Secret:
    Description: The ARN of the secret token that protects your JuypterLab
    Value: !GetAtt JupyterNotebookStack.Outputs.Secret

Use AWS SAM CLI to deploy the parent stack with a command like this one. You will need to substitute in your own ImageUrl value from the container image that you built and pushed earlier:

Language: shell
sam deploy \
  --template-file parent.yml \
  --stack-name machine-learning-environment \
  --resolve-s3 \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides ImageUrl=209640446841.dkr.ecr.us-east-2.amazonaws.com/jupyter-notebook:latest

After the deployment finishes you will see an output section that looks similar this:

Language: txt
-------------------------------------------------------------------------------------------------
Outputs
-------------------------------------------------------------------------------------------------
Key                 Secret
Description         The ARN of the secret token that protects your JuypterLab
Value               arn:aws:secretsmanager:us-east-2:209640446841:secret:JupyterToken-
kZ3MMCCAmjxn-VGGHTz

Key                 JuypterLabUrl
Description         The URL at which you can find your Juypter Lab installation
Value               jupyt-Publi-1U1OSUNR85E3J-297756869.us-east-2.elb.amazonaws.com
-------------------------------------------------------------------------------------------------

This tells you the URL where you can access your Juypter Lab notebook, as well as the details about where you can access to automatically generated secret value that is the token for accessing your notebook.

Access JupyterLab

Open up the AWS Secrets Manager console and look for the secret called JupyterToken as referenced in the outputs section above. After you click on the secret, scroll down and click on "Retrieve Secret Value". Copy the secret value and keep it safe, as this will be the password that you use to get access to your JupyterLab over the internet.

If you wish to change this secret value in AWS Secrets Manager you will need to restart the Amazon ECS JuypterLab task for the change to take effect.

Open up the URL from the outputs section above, and enter the secret token when asked. When it opens you will see a screen similar to this:

At this point you can begin making use of the underlying AWS Inferentia hardware, via the JupyterLab IDE.

Make sure that acceleration is available

Inside of JupyterLab click on the "Other -> Terminal" option to open a tab that has a command line prompt. Any commands that you type in this prompt will run inside of the remote JupyterLab container.

Run the following command:

Language: shell
neuron-ls

You should see output similar to this:

+--------+--------+--------+---------+
| NEURON | NEURON | NEURON |   PCI   |
| DEVICE | CORES  | MEMORY |   BDF   |
+--------+--------+--------+---------+
| 0      | 2      | 32 GB  | 00:1f.0 |
+--------+--------+--------+---------+

This verifies that the AWS Neuron SDK inside of the container is able to connect to the AWS Neuron device, which provides the hardware acceleration of the underlying AWS Inferentia hardware. At this point you can begin to use the Neuron SDK to do machine learning tasks inside of the JupyterLab container.

You can also run the following command to open a hardware monitoring interface:

Language: shell
neuron-top

This will show more info about the Neuron hardware, including its current usage. Right now the Neuron cores are not in use, so let's change that by running a benchmark test:

Test out hardware acceleration

In JupyterLab start a new notebook. Run the following commands as cells in the notebook.

Install dependencies:

Language: py
!python -m pip config set global.extra-index-url https://pip.repos.neuron.amazonaws.com
!pip install neuronx-cc==2.* tensorflow-neuronx ipywidgets transformers

Download a pretrained BERT model and compile it for the AWS Neuron device. This machine learning model is analyzing whether two input phrases that you have given it are paraphrases of each other:

Language: py
import torch
import torch_neuronx
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import transformers


def encode(tokenizer, *inputs, max_length=128, batch_size=1):
    tokens = tokenizer.encode_plus(
        *inputs,
        max_length=max_length,
        padding='max_length',
        truncation=True,
        return_tensors="pt"
    )
    return (
        torch.repeat_interleave(tokens['input_ids'], batch_size, 0),
        torch.repeat_interleave(tokens['attention_mask'], batch_size, 0),
        torch.repeat_interleave(tokens['token_type_ids'], batch_size, 0),
    )


# Create the tokenizer and model
name = "bert-base-cased-finetuned-mrpc"
tokenizer = AutoTokenizer.from_pretrained(name)
model = AutoModelForSequenceClassification.from_pretrained(name, torchscript=True)

# Set up some example inputs
sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

paraphrase = encode(tokenizer, sequence_0, sequence_2)
not_paraphrase = encode(tokenizer, sequence_0, sequence_1)

# Run the original PyTorch BERT model on CPU
cpu_paraphrase_logits = model(*paraphrase)[0]
cpu_not_paraphrase_logits = model(*not_paraphrase)[0]

# Compile the model for Neuron
model_neuron = torch_neuronx.trace(model, paraphrase)

# Save the TorchScript for inference deployment
filename = 'model.pt'
torch.jit.save(model_neuron, filename)

Now run the model on the AWS Neuron device, and compare with the results from running model on the CPU:

Language: py
# Load the TorchScript compiled model
model_neuron = torch.jit.load(filename)

# Verify the TorchScript works on both example inputs
neuron_paraphrase_logits = model_neuron(*paraphrase)[0]
neuron_not_paraphrase_logits = model_neuron(*not_paraphrase)[0]

# Compare the results
print('CPU paraphrase logits:        ', cpu_paraphrase_logits.detach().numpy())
print('Neuron paraphrase logits:    ', neuron_paraphrase_logits.detach().numpy())
print('CPU not-paraphrase logits:    ', cpu_not_paraphrase_logits.detach().numpy())
print('Neuron not-paraphrase logits: ', neuron_not_paraphrase_logits.detach().numpy())

You should see output similar to this:

Language: txt
CPU paraphrase logits:         [[-0.34945598  1.9003887 ]]
Neuron paraphrase logits:     [[-0.34909704  1.8992746 ]]
CPU not-paraphrase logits:     [[ 0.5386365 -2.2197142]]
Neuron not-paraphrase logits:  [[ 0.537705  -2.2180324]]

Whether you do model inference on the CPU or the AWS Neuron device it should produce very similar results, however model inference with Neuron was offloaded onto the underlying Inferentia accelerator, leaving the rest of the EC2 instances resources free for other tasks.

Run the model in a loop as a benchmark to test out performance on the underlying hardware:

Language: py
import time
import concurrent.futures
import numpy as np


def benchmark(filename, example, n_models=2, n_threads=2, batches_per_thread=10000):
    """
    Record performance statistics for a serialized model and its input example.

    Arguments:
        filename: The serialized torchscript model to load for benchmarking.
        example: An example model input.
        n_models: The number of models to load.
        n_threads: The number of simultaneous threads to execute inferences on.
        batches_per_thread: The number of example batches to run per thread.

    Returns:
        A dictionary of performance statistics.
    """

    # Load models
    models = [torch.jit.load(filename) for _ in range(n_models)]

    # Warmup
    for _ in range(8):
        for model in models:
            model(*example)

    latencies = []

    # Thread task
    def task(model):
        for _ in range(batches_per_thread):
            start = time.time()
            model(*example)
            finish = time.time()
            latencies.append((finish - start) * 1000)

    # Submit tasks
    begin = time.time()
    with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:
        for i in range(n_threads):
            pool.submit(task, models[i % len(models)])
    end = time.time()

    # Compute metrics
    boundaries = [50, 95, 99]
    percentiles = {}

    for boundary in boundaries:
        name = f'latency_p{boundary}'
        percentiles[name] = np.percentile(latencies, boundary)
    duration = end - begin
    batch_size = 0
    for tensor in example:
        if batch_size == 0:
            batch_size = tensor.shape[0]
    inferences = len(latencies) * batch_size
    throughput = inferences / duration

    # Metrics
    metrics = {
        'filename': str(filename),
        'batch_size': batch_size,
        'batches': len(latencies),
        'inferences': inferences,
        'threads': n_threads,
        'models': n_models,
        'duration': duration,
        'throughput': throughput,
        **percentiles,
    }

    display(metrics)


def display(metrics):
    """
    Display the metrics produced by `benchmark` function.

    Args:
        metrics: A dictionary of performance statistics.
    """
    pad = max(map(len, metrics)) + 1
    for key, value in metrics.items():

        parts = key.split('_')
        parts = list(map(str.title, parts))
        title = ' '.join(parts) + ":"

        if isinstance(value, float):
            value = f'{value:0.3f}'

        print(f'{title :<{pad}} {value}')


# Benchmark BERT on Neuron
benchmark(filename, paraphrase)

While this notebook code runs you can check neuron-top and you should see output similar to this:

You can see that Neuron cores are in use as the benchmark runs the pretrained BERT model, however the CPU has very little utilization. This is exactly what you want to see: the machine learning inference workload has been almost fully offloaded onto AWS Inferentia hardware.

The benchmark output should look similar to this:

Language: txt
Filename:    model.pt
Batch Size:  1
Batches:     20000
Inferences:  20000
Threads:     2
Models:      2
Duration:    9.944
Throughput:  2011.203
Latency P50: 0.994
Latency P95: 1.017
Latency P99: 1.045

The model has been run 20k times in under 10 seconds, with a p99 latency of ~1ms. As you can see the AWS Inferentia hardware acceleration is ideal for realtime inference applications such as doing inference on demand in response to a web request.

Next Steps

  • Look at the jupyter-notebook.yml stack, and notice the MyIp parameter. It is currently set to 0.0.0.0/0 which allows inbound traffic from all IP addresses. Look up your home or office IP address and set MyIp to a CIDR like 1.2.3.4/32 to ensure that the load balancer in front of JupyterLab only accepts inbound traffic from you and you alone. This adds a second layer of network protection in addition to the secret token.
  • Right now if you restart the ECS task it will wipe any changes that you made to the container's ephemeral filesystem. You may not wish to wipe installed Python packages though. Consider setting up a Python virtual environment that lives inside of the /home directory, since this directory is an Elastic File System that provides durable persistance for the container.
  • Instead of running the model inside of JupyterLab consider creating a model server that does inference in reponse to a network request, and returns the results over the network. Now you can horizontally scale the workload across multiple Inferentia instances behind a load balancer, allowing you to do extremely high volume real time inference machine learning at low latency.
  • If you launch even larger Inferentia instances like inf2.24xlarge or inf2.48xlarge then you should note that they have multiple Neuron devices attached to them. You can run ls /dev/neuron* on the EC2 instance to see a list of the Neuron devices. Right now the task definition only mounts /dev/neuron0 so you will only be able to access two Neuron cores inside the task. For larger Inferentia instances you should update the ECS task definition to mount all of the available host Neuron devices into the container.