snippetterraformModerate

How to support releasing new versions of the code, running in parallel with the last stable release?

Submitted by: @import:stackexchange-devops·Mar 10, 2026·

Viewed 0 times

lastthenewhowwithstablerunningparallelreleasingrelease

Problem

I have one service in production (on AWS), which follows the immutable server pattern. Its deployment looks like this:

Create a new AMI with Packer.

Create a new CloudFormation stack, starting with an auto-scaling group of size 1.

When I see that the new release is good, I can increase the number of instances and eventually shutdown old instances and finally remove the CloudFormation stack from the previous release.

In my initial version of the deployment, I used only one stack and updated it in place. For normal releases, that meant an the auto-scaling group was modified by CloudFormation to point to the new AMI. Then I had to either kill an existing instance or increase the auto scaling group to get an instance with the release running.

Using a new stack for every release makes the process simpler for me, as rollbacks are easier and it makes rolling out releases to parts of the users easier. Similar to the immutable server pattern, I avoid in-place updates but instead only create new resources (stacks in this case).

In the company that I work for, it is now more common to use Terraform instead of CloudFormation. I wonder if it is possible to adept the deployment that I sketched with Terraform. I don't mind using another tool, my main point is that I would like to preserve these basic concepts:

Allow deployments of new releases without touching the stable release

Instead of (in-place) updating the setup, only create new resources and kill the old resources.

So, far I have only briefly worked with Terraform and used it only to manage some small part for our infrastructure. As recommended, I kept the state in an S3 bucket, for instance:

# (from main.tf)
terraform {
  required_version = ">= 0.9.4"
  backend "s3" {
    bucket = "example-company-terraform-state"
    key    = "/foo-service/terraform.tfstate"
    region = "eu-central-1"
  }
}

Here, the key is always fixed. So, Terraform will update everything in place. I assume you could use a n

Solution

There are a few different ways to achieve goals of this sort, each with some different tradeoffs. I'm going to describe the most common ones below.

The simplest approach is to use Terraform's create_before_destroy mechanism with autoscaling groups. An example of this pattern is included in the aws_launch_configuration documentation.

In this scenario, changing the AMI id causes the launch configuration to be re-created. Due to create_before_destroy, the new configuration is created first, then a new autoscaling group is created, adding the new instances to an attached ELB. The min_elb_capacity argument to aws_autoscaling_group can be used to ensure that a given number of instances are present and healthy in the attached ELB before considering the autoscaling group to be created, thus delaying the destruction of the old autoscaling group and launch configuration until the new one is serving requests.

The downside of this approach is the lack of control it represents. Since Terraform is thinking of the entire set of changes as a single run, it's impossible to pause after creating the new instances to allow other checks to be carried out before destroying the old ones. As a consequence, the ELB healthcheck is the only input to deciding if the new release is "good", and rolling back is impossible once the old resources have been destroyed.

A second common approach is to adopt a sort of "blue/green deployment" pattern with explicit changes to two clusters. This is done by putting all of the per-release resources in a child module, and instantiating that module twice with different arguments. In the top-level module this would look something like the following:

resource "aws_elb" "example" {
  instances = "${concat(module.blue.ec2_instance_ids, module.green.ec2_instance_ids)}"

  # ...
}

module "blue" {
  source = "./app"

  ami_id = "ami-1234"
  count  = 10
}

module "green" {
  source = "./app"

  ami_id = "ami-5678"
  count  = 0
}

The principle of operation here is that in the "steady state" (no deployment in progress) only one of these modules has a non-zero count, and the other one has zero. During a deployment, they are both set to the same non-zero count, but with different ami_id values. Each deployment swaps which of the modules is the "active" module, with both being active during the deployment.

When using this approach, each step is a distinct Terraform operation:

change count of the inactive module to nonzero and set its AMI id

apply the change with Terraform, thus activating the new module

verify that the new release is good

change count of the older module to zero

apply the change with Terraform, thus deactivating the old module

Although this has more steps, it allows arbitrary verification and an arbitrary amount of time to pass during step 3. It also allows "rolling back" by resetting the previously-inactive cluster count to zero.

Since both the old and new clusters exist in the same configuration, there is the risk of using this pattern incorrectly and prematurely destroying the active cluster. This can be mitigated by carefully reviewing Terraform's plan to make sure it leaves the old cluster untouched, but Terraform itself can't guarantee this.

Also, since both clusters are using the same child module configuration, it can be tricky to make updates to that configuration while retaining the blue/green separation. If changes are made that would require Terraform to replace the running instances, it's necessary to temporarily have two copies of the module code on disk, make the source arguments point to separate copies, and make the change only to the copy used by the inactive module.

The final approach I'll present is the most extreme and manual, but it does the best job of meeting your requirements and retaining control. This is, in effect, the most literal interpretation of your current CloudFormation workflow, and is a more concrete version of the approach you talked about in your question.

In this approach, there are two entirely-separate Terraform configurations, which I will call "version-agnostic" (things that must survive between versions, such as your ELB) and "version-specific" (the resources that are re-created for each new version).

The version-agnostic configuration will contain the ELB and will, as you suspected, export its id for consumption by the version-specific configuration:

terraform {
  required_version = ">= 0.9.4"
  backend "s3" {
    bucket = "example-company-terraform-state"
    key    = "exampleapp/version-agnostic"
    region = "eu-central-1"
  }
}

resource "aws_elb" "example" {
  # ...
}

output "elb_id" {
  value = "${aws_elb.example.id}"
}

This configuration can be initialized, planned and applied as usual, creating an ELB with no attached instances to start.

The version-specific configuration would be similar to the "app" child module in the previous approach, but this time as a top-level module. The ba

Code Snippets

resource "aws_elb" "example" {
  instances = "${concat(module.blue.ec2_instance_ids, module.green.ec2_instance_ids)}"

  # ...
}

module "blue" {
  source = "./app"

  ami_id = "ami-1234"
  count  = 10
}

module "green" {
  source = "./app"

  ami_id = "ami-5678"
  count  = 0
}

terraform {
  required_version = ">= 0.9.4"
  backend "s3" {
    bucket = "example-company-terraform-state"
    key    = "exampleapp/version-agnostic"
    region = "eu-central-1"
  }
}

resource "aws_elb" "example" {
  # ...
}

output "elb_id" {
  value = "${aws_elb.example.id}"
}

terraform {
  required_version = ">= 0.9.4"
  backend "s3" {
    bucket = "example-company-terraform-state"
    region = "eu-central-1"
  }
}

$ terraform init -reconfigure -backend-config="key=exampleapp/20170808-1"

data "terraform_remote_state" "version_agnostic" {
  backend = "s3"
  config {
    bucket = "example-company-terraform-state"
    key    = "exampleapp/version-agnostic"
    region = "eu-central-1"
  }
}

resource "aws_autoscaling_group" "example" {
  # ...

  load_balancers = ["${data.terraform_remote_state.version_agnostic.elb_id}"]
}

Context

StackExchange DevOps Q#1684, answer score: 19

Revisions (0)

No revisions yet.