Introduction

A common need in AWS development environments is to stop EC2 instances when they aren’t in use. This is because AWS charges either by the second or hour for these resources. Switching them off can yield large savings for your organisation.

Writing your own scripts using the AWS SDK is one option but you will need to invest time in writing these. You’ll also need to maintain and update them when AWS changes their APIs. An alternative approach is to use an existing project which supports this functionality.

Cloud Custodian is an open source project from Capital One. It allows you to enforce your cloud compliance rules in an automated way. Custom policies specify the desired state of the resources in your account. Custodian checks the current state of the environment and makes changes where needed.

Setting up Cloud Custodian

You can run Custodian anywhere including on your machine, an EC2 instance or in a Lambda function. For Lambda functions Custodian handles the deployment for you. It will also create any required CloudWatch Rules.

The recommended practice is to store your policies in source control. A CI server can then run them whenever you commit a change. This will trigger a redeploy of the functions and rules where necessary.

Prerequisites for installation

You’ll need some valid IAM credentials to deploy your functions into AWS. One way is to create an IAM user and run aws configure to store these on your local machine. You can also setup Custodian on EC2 instance and assign a role to it which will provide temporary credentials.

Your role or user will need IAM permissions to allow it to create Lambda functions and CloudWatch rules:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CanCreateCloudCustodianFunctions",
            "Effect": "Allow",
            "Action": [
                "lambda:CreateFunction",
                "lambda:TagResource",
                "events:DescribeRule",
                "events:EnableRule",
                "lambda:GetFunction",
                "events:PutRule",
                "lambda:UpdateFunctionConfiguration",
                "lambda:UpdateAlias",
                "lambda:UpdateFunctionCode",
                "events:PutTargets",
                "iam:PassRole",
                "lambda:AddPermission",
                "lambda:ListTags",
                "lambda:GetAlias",
                "events:ListTargetsByRule",
                "lambda:CreateAlias"
            ],
            "Resource": "*"
        }
    ]
}

Installing Custodian

For Linux users installation of Custodian is quick and easy. You’ll need python2, pip and virtualenv installed:

$ virtualenv --python=python2 custodian
$ source custodian/Scripts/activate
(custodian) $ pip install c7n

Windows support is a work in progress. The recommended workaround is to run inside a Docker container. A sample implementation using Alpine Linux is available on GitHub.

To test your setup run custodian version. This should output the current version of Custodian to the console.

Implementing the on/offhours policy

Custodian policies use yaml format and consist of several sections:

  • Name: A machine-readable name for the policy.
  • Resource: A short identifier for the AWS resource type to act on (ec2, rds, s3 etc).
  • Filters: A list of filters that determine which resources the policy will act on.
  • Actions: A list of actions to perform on the matching resources.

Below is an example of a basic offhours policy for EC2 resources:

policies:
  - name: offhours-policy
    resource: ec2
    filters:
      - type: offhour
        default_tz: bst # set this to your timezone
        offhour: 18 # the hour when instances will be shut down
    actions:
      - stop

This policy uses the offhour filter. The offhours filter finds all EC2 resources which are running after a specified hour. Custodian then applies the stop action to those resources.

An onhours policy is written in a similar way:

policies:
  - name: onhours-policy
    resource: ec2
    filters:
      - type: onhour
        default_tz: bst # set this to your timezone
        onhour: 8 # the hour when instances will be started up
    actions:
      - start

Making the policy opt-in

You may not want to enforce your offhours policies, especially at first. Custodian allows your developers to either opt-in or opt-out with a custom tag.

An opt-in policy is one where the policy includes a tag attribute and opt-out is false or not specified. In this case Custodian will only stop instances if they have the tag:

policies:
  - name: onhours-policy
    resource: ec2
    filters:
      - type: onhour
        default_tz: bst # set this to your timezone
        tag: downtime
        onhour: 8 # the hour when instances will be shut down
    actions:
      - start

When opt-out is true all instances will be stopped unless the developer opts out by adding the tag:

policies:
  - name: onhours-policy
    resource: ec2
    filters:
      - type: onhour
        default_tz: bst # set this to your timezone
        tag: no-downtime
        opt-out: true
        onhour: 8 # the hour when instances will be started up
    actions:
      - start

Deploying a policy

Prerequisites for deployment

Before deploying we need to create a role for the Lambda function to use. The role should have EC2ReadOnlyAccess and CloudwatchFullAccess managed policies attached. It will also need the following custom inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "CanStopAndStartEC2Instances",
            "Effect": "Allow",
            "Action": [
                "ec2:StopInstances",
                "ec2:StartInstances"
            ],
            "Resource": "*"
        }
    ]
}

Using the least permissions required is a good security practice. It also reduces the scope for Custodian to do something unexpected.

Configuring the policy for deployment

To configure Custodian to deploy as a Lambda you can add a mode section to your policy:

policies:
  - name: offhours-policy
    mode:
        type: scheduled
        schedule: "cron(1 * * * ? *)" # Run every hour at one minute past the hour
        role: arn:aws:iam::123456789012:role/CloudCustodianOffHours
    resource: ec2
    filters:
      - type: offhour
        default_tz: bst # set this to your timezone
        tag: downtime
        opt-out: false
        offhour: 18 # the hour when instances will be shut down
    actions:
      - stop

The mode section has the following fields:

  • Type: This determines how the Lambda is triggered. For example, this could be on a schedule defined in a CloudWatch rule or in response to a CloudTrail event.
  • Schedule: This is used for scheduled Lambda functions. An expression in CloudWatch scheduler syntax can be provided here.
  • Role: The role which the Lambda function should assume. This should be setup in advance of deploying the function.

A gotcha about time zones

To keep things simple the examples above are run every hour. You could optimise this to run only during the offhour hour. If you decide to do so watch out for this gotcha: filters use your timezone but schedules are always evaluated in UTC.

This inconsistency can lead to issues when using both together. For example, in the BST/GMT time zone, filters and schedules will have an hour offset during summer time. If the hour doesn’t match exactly the on/offhours filter won’t run so your policy will not work during summer.

Running the deployment

Deploying the Lambda function is handled entirely by Custodian. Simply use the run command:

custodian run --region=eu-west-2 --output-dir=/tmp policies/offhours.yml

Custodian also has a dryrun parameter which can be used to test your policy. When doing so the policy will run in-place rather than being deployed. This means your IAM role or user will need the same permissions as the deployed Lambda function.

The region parameter specifies where the function and CloudWatch rule will be setup e.g. eu-west-2. If you want to deploy globally you can specify all. You may need separate policies for teams in different time zones.

Once the run has complete you should see a new lambda function and CloudWatch rule. These should both be called custodian-offhours-policy.

Screenshot of AWS Console showing the CloudWatch Rule for the Custodian offhours policy.

Screenshot of AWS Console showing the Lamdba function for the Custodian offhours policy.

Testing the deployment

To test the deployment end-to-end you can use an opt-in policy and tag a single instance with your chosen tag. You can then set the offhour for the next hour and the onhour for an hour after that. You might have to wait a while to see if it has worked.

Extending your setup

In this post, I’ve covered policies for stopping and starting EC2 instances on a schedule. You can also setup similar policies for other resources such as RDS instances and Auto Scaling groups.

There are also a lot of other use cases for Custodian including security, compliance and cost-saving applications. For more information see the Cloud Custodian documentation.

Conclusions

Custodian is quite easy to use and the Lambda integration is particularly convenient. It also seems like a neater way to manage your environment than writing lots of custom scripts. There are some gotchas to be aware of and care is needed when setting up policies. I definitely will be making more use of Custodian in the future.