HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

System backup on Linux

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
systemlinuxbackup

Problem

I've written this Python script to make regular backups of a Linux server. It works pretty well except that the script sometimes backups files more than one time, which is pretty strange and I don't know why that happens.

But that's not the key point of my posting here (just someone may see a little error I've made which explains this behaviour). Since I've learned all I know about Python by myself I'm a little bit nervous to use my own code in a productive environment, so I need a code review.

In short the script should check all files in the filesystem of a Linux server except a small collection of directories to avoid. Every 'walked' file should be checked for last modification date. If it's been modified within the last 24 hours it should backup and log into its logfile. Further more it should check after each run how many backups exist on the network share and if a certain number is reached it should delete the oldest one.

```
import datetime
import re
import os
import subprocess
import socket
import glob
from stat import *
import time
from collections import defaultdict

host = socket.gethostname()
date = datetime.datetime.today()
date = date.strftime("%Y-%m-%d")
mountPoint = "/mnt/linuxbackup/"
nas = '//172.19.3.5/backup/linuxserver'
credentials = 'sec=ntlm,credentials=/root/.backup.secret'
dirPath = '/'
past = time.time() - 86400

# Global settings

fullBackup_dict = {
'backupdir' : mountPoint + host + '/',
'backupname' : host + '.' + date,
'backupstokeep' : 4,
'excludelist' : ('/proc',
'/lost+found',
'/sys',
'/media',
'/var/cache',
'/var/log',
'/mnt',
'/dev',
'/run',
'/tmp'
),
}
dailyBackupDict = defaultdict(list)
includelist = ''

excludelist = ''
for a in fullBackup_d

Solution

First of all, you should definitely read the Python style guide (PEP0008). It contains tons of good formatting and style info to make code more readable and neat for you and other users. I'll reference some of the points relevant to you but the whole page is good to commit to memory.

You should rearrange the imports so they're grouped for readability. Plain imports together and relative imports in one block.

import datetime
import re
import os
import subprocess
import socket
import glob
import time

from collections import defaultdict
from stat import *


Also though, import * is almost always a bad idea. It means I have no idea what you're importing from stat. It's a good thing it's a builtin module, because if it was third party or custom then I'd be at a loss to understand what some of the functions imported from there do. If you just do import stat, having to call stat.function is much more readable. But if you want to avoid that, from stat import function1, function2, CONSTANT is still easy to trace.

I prefer to combine calls like your date assignments. It would be quite readable and not too long a line.

date = datetime.datetime.today().strftime("%Y-%m-%d")


Try to avoid mixing single and double quotes. It can get confusing. You may need to on occasion, but avoid it as much as possible.

mountPoint = "/mnt/linuxbackup/"
nas = '//172.19.3.5/backup/linuxserver'


Also mountPoint isn't Pythonic naming convention. Python uses snake_case for variables and functions. mount_point would be a better name. Speaking of names, you use good ones for these variables, but names aren't always enough. past is not a self evident name that requires no comment to be understood. Because of your description of the program I know that you're basically checking for 24 hours worth of seconds, as that's the time into the past your script checks. But past does not communicate this. Change it to indicate its use, like

backup_period = time.time() - 86400   # 24 hours


much clearer. Still on names, don't name a variable based on its type. fullBackup_dict is clearly a dictionary when you define it. The syntax is unmistakeable. Likewise when you call it with keywords Python programmers will know what kind of data it is. You only need to make types explicit in variable names when you're using unusual types that might be mistaken for others. Like a data type that might be derived from a dictionary. But even in that case, consider whether or not it's actually worth noting in the name.

I also don't think you should refer to your settings as 'Global'. Python has a global keyword but that's not what you're trying to say here. Something like "Configuration" makes more sense and wont accidentally imply any Python keyword.

Unless I'm missing something here, you're making a file path with string concatenation. Don't do that, use os.path.join. It's an intelligent function that knows what OS the user is running and can properly account for slashes whether they're in the strings or not. In other words, it wont care if you have a / at the end of mountPoint. It can create the correct path in either case. Of course if you're intentionally not using that, you should comment on why this is a special case.

'backupdir' : os.path.join(mountPoint, host) + '/',


Again comments would help here but what's includelist for? I can guess at the intent, but it's first set to an empty string and then only changed in an else condition halfway down your script. Context is hard to discern without clear explanations. Also, the note about not naming variables after their types goes double for when it's not actually their type. Naming a string with list is a recipe for confusion. Instead call it something like exclude_files or exclude_folders, to avoid accidentally indicating the wrong thing.

As for excludelist, you could shorten this with the str.join method. You can use it to concatenate all the elements of a list into one string, but helpfully in your case you can also specify a string to go between each element, --exclude= in your case. There is one slight drawback, you need to bookend it with --exclude= and ` to make it exactly as before, but I still think it looks neater.

excludelist = ('--exclude=' + '   --exclude='.join(fullBackup_dict['excludelist']) + '   ')


What's odd is that you then assign this back to your dictionary, why not assign it there in the first place? Either put the expression directly into the dictionary or create the value before you build the dictionary. I'd opt for the latter in this case, just because it's a long expression to embed in a dictionary.

These checks you perform here seem important, I think they'd be worth clarifying what you're doing. You're clearly checking on the mount point's existence but I don't understand the
check_calls. Maybe if I was familiar with Linux I would but it can't hurt to comment.

``
# Check the mo

Code Snippets

import datetime
import re
import os
import subprocess
import socket
import glob
import time

from collections import defaultdict
from stat import *
date = datetime.datetime.today().strftime("%Y-%m-%d")
mountPoint = "/mnt/linuxbackup/"
nas = '//172.19.3.5/backup/linuxserver'
backup_period = time.time() - 86400   # 24 hours
'backupdir' : os.path.join(mountPoint, host) + '/',

Context

StackExchange Code Review Q#106736, answer score: 10

Revisions (0)

No revisions yet.