snippetgitMinor
How to keep many servers (5000+) up-to-date?
Viewed 0 times
howserverskeepdate5000many
Problem
Initially asked here: https://stackoverflow.com/questions/60674502/how-to-keep-many-servers-5000-up-to-date-with-git-and-its-rate-limits
The initial post:
We're making a php service that will run on many servers, think 5000+. We host our code on git (bitbucket). We wondered what the best way to keep the servers up-to-date would be.
We figured either post-commit hooks (but what would happen then if a few servers didn't receive the update notification?) or git fetching every minute with cron. We want to go with the cron way of doing things, since it isn't possible it would fail, even if a server would be offline (either turned off or disconnected from the network), it would still resolve itself eventually.
We're doing a fetch every minute, and then compare to see if it needs to pull, if so it pulls and runs the migration code.
We would like to run this every minute so that the servers will be synchronized with each other as soon as possible.
Now we wonder, what about rate limits? We're using bitbucket, and the rate limits are 60 000 requests per hour (so a 1000 per minute), which would limit us to 1000 servers max before we will get problems then?
But it also says, if we make a public repo, we can make unauthenticated calls, the limits of which go by IP rather than per user, so we won't run into any limits then no matter how many servers we will have. Downside is, we will have to encrypt the repo then, then on pull, decrypt it and copy over the decrypted files.
Is this the best way of handling this? It seems very unconventional. What is the standard or recommended way of handling this (if there is any)?
After having read the answers, we now think that having a group of servers that pull in any changes (from git). For readability, we're calling these the "gitpull-servers", and the 5000+ servers the "webservers".
The plan is to make the gitpull-servers fetch (and possibly pull) from git.
The webservers periodically query the gitpull-servers to see if there
The initial post:
We're making a php service that will run on many servers, think 5000+. We host our code on git (bitbucket). We wondered what the best way to keep the servers up-to-date would be.
We figured either post-commit hooks (but what would happen then if a few servers didn't receive the update notification?) or git fetching every minute with cron. We want to go with the cron way of doing things, since it isn't possible it would fail, even if a server would be offline (either turned off or disconnected from the network), it would still resolve itself eventually.
We're doing a fetch every minute, and then compare to see if it needs to pull, if so it pulls and runs the migration code.
We would like to run this every minute so that the servers will be synchronized with each other as soon as possible.
Now we wonder, what about rate limits? We're using bitbucket, and the rate limits are 60 000 requests per hour (so a 1000 per minute), which would limit us to 1000 servers max before we will get problems then?
But it also says, if we make a public repo, we can make unauthenticated calls, the limits of which go by IP rather than per user, so we won't run into any limits then no matter how many servers we will have. Downside is, we will have to encrypt the repo then, then on pull, decrypt it and copy over the decrypted files.
Is this the best way of handling this? It seems very unconventional. What is the standard or recommended way of handling this (if there is any)?
After having read the answers, we now think that having a group of servers that pull in any changes (from git). For readability, we're calling these the "gitpull-servers", and the 5000+ servers the "webservers".
The plan is to make the gitpull-servers fetch (and possibly pull) from git.
The webservers periodically query the gitpull-servers to see if there
Solution
I don't think the proposed plan here makes much sense. Tools like Ansible are free and designed to do this with good, central management and logging.
/ well documented and easy to use).
/ well documented and easy to use) in its pipeline.
You won't have any API / rate limits here and you'll easily be able to target more hosts, with no effort on your end, with the dynamic inventory.
This is industry standard tooling. You can also use Ansible to keep the servers up to date patch wise and all kinds of other stuff while you're at it, so it's a re-usable solution.
- Put code in BitBucket.
- Make a Jenkins server (open source / free / low on resources
/ well documented and easy to use).
- Make BitBucket call Jenkins with a hook on change.
- Or, if you feel like it, Jenkins can poll bit-bucket, but this is less efficient.
- Have Jenkins call ansible (also open source / free / low on resources
/ well documented and easy to use) in its pipeline.
- Ansible can target all 5000 servers with a dynamic inventory based on querying a database/file/API/whatever you have.
- This way, bit-bucket has almost no load, ansible does everything from one location, runs are recorded in Jenkins with info on each host.
You won't have any API / rate limits here and you'll easily be able to target more hosts, with no effort on your end, with the dynamic inventory.
This is industry standard tooling. You can also use Ansible to keep the servers up to date patch wise and all kinds of other stuff while you're at it, so it's a re-usable solution.
Context
StackExchange DevOps Q#11073, answer score: 4
Revisions (0)
No revisions yet.