snippetMinor

How to scale-down in a multi-tenant environment?

Submitted by: @import:stackexchange-devops·Mar 10, 2026·

Viewed 0 times

tenantmultienvironmentdownhowscale

Problem

Cloud environments in AWS allow for multi-tenancy managed by the user himself, classic example are container orchestrators such as ECS or Kubernetes.

When you have two services, one needs memory another cpu and you put these in a single cluster. Then scaling-up is relatively trivial. Each time you need more capacity in terms of either cpu or memory, add more capacity. Since EC2 capacity means units in cpu and memory both.

Scaling up based on a single metric can very easily achieved using CloudWatch Alarms.

When scaling down, in order to reduce cost it requires to take into account both memory and cpu limits and not let any of the two drop below the required amount.

Since unfortunately CloudWatch Alarms do not allow to use boolean logic or take into account multiple metrics.

What is a good way to implement scale down of capacity for an auto scaling group?

Solution

Autoscaling is a good case for machine learning

This is a hard problem to do well.

What you really want is something like Nest Thermostat for your EC2 infrastructure.

There are (aforementioned) multiple dimensions of resource demand/limitations.

memory

disk space

disk IO

network IO

concurrency/latency/queue depth

There are multiple indicators of demand (in addition to the above).

concurrent unique visits/sessions

pages per visit/session

rate of engagement/interaction feature usage

There are multiple common patterns of changing demand over time.

daily user demand cycles

weekly user demand cycles

monthly ...

annual ...

special event/days

DDoS load

media/marketing exposure traffic spikes

There are multiple financial decision factors.

does revenue scale with traffic? How? (How conservative do you need to be?)

is there a hidden cost to control (transaction costs, limits)?

what's the cost model of scaling? (things in a pipeline scale together, things in a load-balanced cluster scale independently)

Before long, if you try to hand-optimize on hand-selected features you're going to have a monster of technical debt that is possibly more complicated than any other logic in your site. Amazon makes more money when you err (with a large margin) on the side of caution, so their tools will probably never get close to what you want.

Instead, choose an architecture/technology stack that can grow/scale so you don't have to get it exactly right the first time. Then pick a few factors which you think are obvious. Then try to come up with a way to sort multiple representative possibilities in order of preference. Then collect some real world data covering all those points. If you're lucky, a simple obvious hand-coded solution will jump out at you from looking at the data. If not, code up something that will give you an approximate model f(x1,x2,x3,x4) --> y * app nodes, using an appropriate algorithm.

I bet you didn't think this one was going to be so much fun!

Context

StackExchange DevOps Q#1398, answer score: 2

Revisions (0)

No revisions yet.