HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Autoscaling on GoCD agents without terminating active builds?

Submitted by: @import:stackexchange-devops··
0
Viewed 0 times
withoutactivegocdbuildsagentsterminatingautoscaling

Problem

At the moment I have an AutoScaling Group (ASG) of GoCD build agents without any scaling policies. I have created some custom metrics that indicate how many build agents are currently idle and I'd like to scale based off of that. My concern is that when scaling down, the ASG may terminate instances that are in the middle of a build. This may result in failed builds and delayed builds.

How can I scale down an ASG without terminating instances that are in use?

Solution

Auto Scaling groups have a useful feature for this, named lifecycle hooks.

Worflow taken from the documentation above:

As you can notice there's a Scale in step, triggering a Terminating:Wait for the autoscaling group and notifying the instance to be terminated, the instance has now to do it's work and once done signal it is ok to be terminated.

If the task can take longer than the autoscaling group HeartbeatTimeout parameter you can reset the timeout with (quoting still from the same page):


Restart the timeout period by recording a heartbeat, using the
record-lifecycle-action-heartbeat command or the
RecordLifecycleActionHeartbeat operation. This increments the
heartbeat timeout by the timeout value specified when you created the
lifecycle hook. For example, if the timeout value is 1 hour, and you
call this command after 30 minutes, the instance remains in a wait
state for an additional hour, or a total of 90 minutes.

So for your case, the lifecycle notification should startup a kind of script/program which will:

  • prevent this builder to receive new builds



  • loop periodically to check if the ongoing builds are done



  • if there's build still in progress and the timeout is near, reset the timer



  • if there's no more builds in progress, signal to proceed to termination

Context

StackExchange DevOps Q#2386, answer score: 3

Revisions (0)

No revisions yet.