debugMinor
Can I have actions that, if taken, cannot be taken again in the following N timesteps?
Viewed 0 times
cannotcanactionsthetimestepsfollowingthattakenagainhave
Problem
I'm trying to train a load-balancing system with reinforcement-learning (RL) s.t. the incoming jobs are queued evenly at the available servers.
The system will not be able to directly dispatch the jobs, but it will only offer suggestions to use a set of servers over another.
The system will also be capable of reducing the speed at which new jobs arrive and at each time step, to reduce the queue lengths.
The agent must decide whether to change this speed or keep it constant.
Changing the speed at time T, however, disables any other changes until time T+N, with N constant (basically, we need to prevent speed changes at every time-step).
Are these actions an obstacle in training the system?
All RL systems I know only deal with actions that can be taken at every time step and don't put any constraints in future use of the action. Is this a limitation of RL agents?
The system will not be able to directly dispatch the jobs, but it will only offer suggestions to use a set of servers over another.
The system will also be capable of reducing the speed at which new jobs arrive and at each time step, to reduce the queue lengths.
The agent must decide whether to change this speed or keep it constant.
Changing the speed at time T, however, disables any other changes until time T+N, with N constant (basically, we need to prevent speed changes at every time-step).
Are these actions an obstacle in training the system?
All RL systems I know only deal with actions that can be taken at every time step and don't put any constraints in future use of the action. Is this a limitation of RL agents?
Solution
This is not a limitation. Which actions are valid can depend on the current state of the system.
To enforce your requirement (changing speed at time T means you can't change again until time T+N), you can add additional information to the state: basically, add how long ago was the last change (where if it's more than N steps ago, you don't keep track of how long, just that it was long enough that a change is now allowed).
To enforce your requirement (changing speed at time T means you can't change again until time T+N), you can add additional information to the state: basically, add how long ago was the last change (where if it's more than N steps ago, you don't keep track of how long, just that it was long enough that a change is now allowed).
Context
StackExchange Computer Science Q#70083, answer score: 2
Revisions (0)
No revisions yet.