snippetsqlMinor
How to keep off-site developer databases up to date with production?
Viewed 0 times
howdatabasesproductionwithkeepdatesitedeveloperoff
Problem
We have a production database running on SQL Server 2008 R2 with files stored in it. We were concerned about the size of the database so we've opted to enable FILESTREAM so the files would be stored on the disk and we could hopefully reduce the size of our backups. I have found that we can exclude the FILESTREAM filegroup from the backups but then we have issues querying those tables upon restore (because the fields are missing data, which is expected). Our next thought was then to possibly sync the files from the disk using automated FTP or dropbox or something of that nature. I haven't completely explored this yet though.
The real problem we're trying to solve here is keeping our developers up to date with the production database. With the growing size of our database it's becoming more and more of a chore to do a full backup and download the database and it will eventually become prohibitive. I'm wondering what other folks are doing to keep remote devs up to date.
I've taken a quick look at SQL Server's log shipping but that just seems like it will require a lot of custom automation (scheduling FTP and restore scripts) to get it working reliably with a few devs in various geographic locations and I'm not sure how to use the secondary database for development since it would likely be read-only. I'm open to any suggestions and please let me know if I can provide more information about our situation that would help bring out suggestions.
UPDATE: The developers only need a reasonably current copy of the production data. This helps to reproduce issues so they can be resolved. We're okay with the data being a bit stale but keeping the transfer times to a minimum would suggest frequent backups.
Thanks for the suggestions so far, just to add a bit more our database is currently 12GB or so and we expect to double that in the near future with the signing of more clients. I've run a couple test backups with compression enabled and that saved us about 1-2GB but I
The real problem we're trying to solve here is keeping our developers up to date with the production database. With the growing size of our database it's becoming more and more of a chore to do a full backup and download the database and it will eventually become prohibitive. I'm wondering what other folks are doing to keep remote devs up to date.
I've taken a quick look at SQL Server's log shipping but that just seems like it will require a lot of custom automation (scheduling FTP and restore scripts) to get it working reliably with a few devs in various geographic locations and I'm not sure how to use the secondary database for development since it would likely be read-only. I'm open to any suggestions and please let me know if I can provide more information about our situation that would help bring out suggestions.
UPDATE: The developers only need a reasonably current copy of the production data. This helps to reproduce issues so they can be resolved. We're okay with the data being a bit stale but keeping the transfer times to a minimum would suggest frequent backups.
Thanks for the suggestions so far, just to add a bit more our database is currently 12GB or so and we expect to double that in the near future with the signing of more clients. I've run a couple test backups with compression enabled and that saved us about 1-2GB but I
Solution
I manage a DB environment for several tier 1 telcos and our data/environments are distributed across the world, there are many ways you can achieve what you're looking for but it depends on your pain points.
Questions:
-How big are the databases, how far are the nodes (or more importantly what is the latency between them), and how much bandwidth do you have?
-How much transaction log activity do you generate?
-How often do you want to restore?
-Are you enabling data compression on the backups?
Some Lessons Learned
-One thing I noticed with copying data across the world is that the Windows CIFS protocol absolutely sucks for doing this. It is highly inefficient.
We ended up using HTTPS and issuing a GET request. It was over 10x faster copying backups from Asia to Los Angeles.
Using Multithreading options in Robocopy or other copy programs was useful if you break up the backup into multiple files.
Some Options:
2 Surprising Linux Solutions
-
A solution I really like is to RSYNC with Linux servers on each end and mount the backup folder. This is really nice as backups are automatically synced using a protocol that is efficient for WANs. If you don't want to sync ALL of your backups, mount another drive on your linux server that really mounts to a windows share that you will place backups in. The RSYNC will take care of it from there and do it much faster than CIFS. Linux protocols are much better for long distance file copying than Windows CIFS based on my experience. I spoke with some of CCIEs and they stated the same.
-
You could also look at a VM WAN Optimizer (Google or 'Bing' WAN optimizer or get more specific, for example high latency high bandwidth WAN optimizer or low latency high bandwidth, whatever your environment is)
SQL Transaction Log Shipping Using FTP is not scary at all. I had a junior .Net developer get it running fast, although I was impressed and got him hired at another company I worked at. Problem is you might send a lot more data in a 24 hour period if there are a lot of UPDATES but not INSERTS using log shipping, instead of just copying backups.
SAN File System Block Level Replication: Not sure if you have hardware that supports this, but many mid range SANs allow you to replicate live data to the other ones. If yours does, this is a great option as it has 0 sql server overhead, unless your network pipe is capped in which case you have other issues.
Transaction replication to a central database: You could create a replication scheme where you have an up to date copy of the data. You could then take your backups from that copy so it doesn't have to copy full backups across the wire.
Let us know some of your pain points and we can go through other options.
Also be sure to read this excellent white papers by MS if you go with replication:
Geo-Replication Performance Gains with Microsoft SQL Server 2008 Running on Windows Server 2008
Questions:
-How big are the databases, how far are the nodes (or more importantly what is the latency between them), and how much bandwidth do you have?
-How much transaction log activity do you generate?
-How often do you want to restore?
-Are you enabling data compression on the backups?
Some Lessons Learned
-One thing I noticed with copying data across the world is that the Windows CIFS protocol absolutely sucks for doing this. It is highly inefficient.
We ended up using HTTPS and issuing a GET request. It was over 10x faster copying backups from Asia to Los Angeles.
Using Multithreading options in Robocopy or other copy programs was useful if you break up the backup into multiple files.
Some Options:
2 Surprising Linux Solutions
-
A solution I really like is to RSYNC with Linux servers on each end and mount the backup folder. This is really nice as backups are automatically synced using a protocol that is efficient for WANs. If you don't want to sync ALL of your backups, mount another drive on your linux server that really mounts to a windows share that you will place backups in. The RSYNC will take care of it from there and do it much faster than CIFS. Linux protocols are much better for long distance file copying than Windows CIFS based on my experience. I spoke with some of CCIEs and they stated the same.
-
You could also look at a VM WAN Optimizer (Google or 'Bing' WAN optimizer or get more specific, for example high latency high bandwidth WAN optimizer or low latency high bandwidth, whatever your environment is)
SQL Transaction Log Shipping Using FTP is not scary at all. I had a junior .Net developer get it running fast, although I was impressed and got him hired at another company I worked at. Problem is you might send a lot more data in a 24 hour period if there are a lot of UPDATES but not INSERTS using log shipping, instead of just copying backups.
SAN File System Block Level Replication: Not sure if you have hardware that supports this, but many mid range SANs allow you to replicate live data to the other ones. If yours does, this is a great option as it has 0 sql server overhead, unless your network pipe is capped in which case you have other issues.
Transaction replication to a central database: You could create a replication scheme where you have an up to date copy of the data. You could then take your backups from that copy so it doesn't have to copy full backups across the wire.
Let us know some of your pain points and we can go through other options.
Also be sure to read this excellent white papers by MS if you go with replication:
Geo-Replication Performance Gains with Microsoft SQL Server 2008 Running on Windows Server 2008
Context
StackExchange Database Administrators Q#32977, answer score: 4
Revisions (0)
No revisions yet.