HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Cleaning old XML log files automatically

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
oldlogautomaticallyxmlfilescleaning

Problem

I have a piece of code that searches for XMLs in given folders, delete the ones older than X, zip the ones older than Y and delete zips older than X. It works perfectly but I have a terrible feeling that this is very inefficient code.

A little more information: It reads a config file for the times to delete/zip using Simple Config Parser, a library.

I am a temp at a networking company and this is one of the first pieces of "useful" code I've written.

```
import os, fnmatch, time, datetime, shutil, re
from lib.configobj import ConfigObj

if os.path.isfile('Cleaner_Script_Config.cfg')== True:
configreader = ConfigObj('Cleaner_Script_Config.cfg')
How_Old_To_Delete_In_Days = configreader.get('How_Old_To_Delete_In_Days')
How_Old_To_Zip_In_Days = configreader.get('How_Old_To_Zip_In_Days')
Folders_To_Clean = os.path.join(configreader.get('Folders_To_Clean'))
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean_List= Folders_To_Clean.split(' ')
else:
How_Old_To_Delete_In_Days = int('90')
How_Old_To_Zip_In_Days = int('7')
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean = '/home/test/betfair-ftpdata /home/test/betfair-ftpdata2'
Folders_To_Clean_List= Folders_To_Clean.split(' ')

def XML_Cleaner(extension, folders):
for folder in folders:
print('Running XML cleaner function')
Files_To_Clean = []
To_Delete_Time = (int(time.time()) - int(How_Old_To_Delete_In_Seconds))
To_Zip_Time = (int(time.time()) - int(How_Old_To_Zip_In_Seconds))
print ('All files older than: %s are to be deleted' % datetime.datetime.fromtimestamp(int(To_Delete_Time)).strftime('%Y-%m-%d %H:%M:%S'))
print ('All files older than: %s are to be ziped' % datetime.datetime.fromtimestamp(int(To_Zip_Time)).st

Solution

Style

PEP8

PEP8 is python style guide. It says how you should style python code.

-
Lines should be a maximum of 79 characters.

The exception to this is comments at 72.

-
Variables should be in snake_case. Not

Capitalized_Words_With_Underscores (ugly!)

The exception to this is classes with CamelCase.

And Constants with UPPER_SNAKE_CASE.

-
Imports should be on separate lines.

-
Operators should have one space either side of them. 1 + 1.

The exception to this is to show precedence. 2*2 + 1.

-
When comparing to singletons, True; None, use this is keyword.

-
When checking if something is true, don't compare to True.

-
Do not overwrite builtins, file, use a synonym, path, or add a trailing underscore, file_. The former is better.

You have a good style otherwise. It's nice seeing long variable names, as they are much better than a.

Code

String formatting

Currently you use both the + and % operators to add data to strings.
You should use str.format instead.

'File %s is to delete!' % file
# To
'File {} is to delete!'.format(file)

os.rename(
     str(madefolder) + '.zip',
     str(folder) + str(os.sep) + str(madefolder) + '.zip')
# To
os.rename(
     '{!s}.zip'.format(makefolder),
     '{!s}{!s}{!s}.zip'.format(folder, os.sep, makefolder))


Getting the config

You may want to make things have absolute paths.
If you call the file from a different directory, it may not work, as it will not be able to find the needed files.

This is as you may be in 'mydocuments', but the python file might be in 'programs'.
The program will work relatively from 'mydocuments', and not 'programs'.
And so you need to get a path for the settings.

file_path = os.path.dirname(__file__)


This with your os.path.isfile will mean that you will always open the correct settings.

if os.path.isfile(os.path.join(file_path, 'Cleaner_Script_Config.cfg')):
    # Do other stuff


In python it is very uncommon to do int('90'). Just do 90.

XML_cleaner

You call XML_cleaner twice, with the 'same' extension. This makes it so that you have to os.walk more than needed.
I would change it so that you have os.walk as a separate function. Then you can pass in extensions, and how to handle them.

def walk_through_folders(folders, handlers):
    for folder in folders:
        for root, dirs, files in os.walk(folder):
            for name in files:
                extension = os.path.splitext(name)[1].lower()
                if extension in handlers:
                    handlers[extension](os.path.join(root, name))

walk_through_folders(folders, {'.xml': XML_cleaner, '.zip': ZIP_cleaner})


The way this will work is rather than handling everything for later, you handle it now.
This has the pro of not building another list, which can be expensive later.
It also means that you can handle ZIP files too.
However it means that you don't build a list of paths.

Next we need to make it so there is a time that you will delete from.
In the global scope you can add this.

This will make it so there is only one time that it will delete from. And will report it once.

def datetime_strftime(time, format_):
    return datetime.datetime.fromtimestamp(time).strftime(format_)

TO_DELETE_TIME = time.time() - How_Old_To_Delete_In_Seconds
TO_ZIP_TIME = time.time() - How_Old_To_Zip_In_Seconds
print('All files older than: {} are to be deleted.'.format(
    datetime_strftime(to_delete_time, '%Y-%m-%d %H:%M:%S')
))
print('All files older than: {} are to be ziped.'.format(
    datetime_strftime(to_zip_time, '%Y-%m-%d %H:%M:%S')
))


As we are handling each file one at a time we will have to split the remaining of XML_cleaner into two functions.
One that cleans up the zip files, and another that will handle the single path passed to it.

This means that after we have completed walk_through_folders, we will need to call zip_files().

def XML_cleaner(path):
    info = os.stat(path)
    day_folder = datetime.datetime.fromtimestamp(info.st_mtime) \ 
                 .strftime('%Y-%m-%d')
    if not os.path.exists(day_folder):
        os.makedirs(day_folder)
    if info.st_mtime < TO_DELETE_TIME:
        print ('Deleting: {!r}'.format(path))
        os.remove(path)
    elif info.st_mtime < TO_ZIP_TIME:
        print ('Ziping: {!r}'.format(path))
        infolder = (day_folder +
                    os.sep +
                    os.path.basename(os.path.normpath(path)))
        os.rename(path, infolder)

def zip_folders(folders):
    reobj = re.compile('\d{4}-\d{2}-\d{2}')
    for folder in folders:
        for root, subs, files in os.walk('.'):
            for name in subs:
                if reobj.match(name):
                    shutil.make_archive(name, 'zip', name)
                    shutil.rmtree(name)
                    os.rename(
                        '{!s}.zip'.format(name),
                        '{!s}{!s}{!s}.zip'.format(folder, os.sep, name))


ZIP_cleaner

Code Snippets

'File %s is to delete!' % file
# To
'File {} is to delete!'.format(file)

os.rename(
     str(madefolder) + '.zip',
     str(folder) + str(os.sep) + str(madefolder) + '.zip')
# To
os.rename(
     '{!s}.zip'.format(makefolder),
     '{!s}{!s}{!s}.zip'.format(folder, os.sep, makefolder))
file_path = os.path.dirname(__file__)
if os.path.isfile(os.path.join(file_path, 'Cleaner_Script_Config.cfg')):
    # Do other stuff
def walk_through_folders(folders, handlers):
    for folder in folders:
        for root, dirs, files in os.walk(folder):
            for name in files:
                extension = os.path.splitext(name)[1].lower()
                if extension in handlers:
                    handlers[extension](os.path.join(root, name))

walk_through_folders(folders, {'.xml': XML_cleaner, '.zip': ZIP_cleaner})
def datetime_strftime(time, format_):
    return datetime.datetime.fromtimestamp(time).strftime(format_)

TO_DELETE_TIME = time.time() - How_Old_To_Delete_In_Seconds
TO_ZIP_TIME = time.time() - How_Old_To_Zip_In_Seconds
print('All files older than: {} are to be deleted.'.format(
    datetime_strftime(to_delete_time, '%Y-%m-%d %H:%M:%S')
))
print('All files older than: {} are to be ziped.'.format(
    datetime_strftime(to_zip_time, '%Y-%m-%d %H:%M:%S')
))

Context

StackExchange Code Review Q#96984, answer score: 4

Revisions (0)

No revisions yet.