patternpythonMinor
Cleaning old XML log files automatically
Viewed 0 times
oldlogautomaticallyxmlfilescleaning
Problem
I have a piece of code that searches for XMLs in given folders, delete the ones older than X, zip the ones older than Y and delete zips older than X. It works perfectly but I have a terrible feeling that this is very inefficient code.
A little more information: It reads a config file for the times to delete/zip using Simple Config Parser, a library.
I am a temp at a networking company and this is one of the first pieces of "useful" code I've written.
```
import os, fnmatch, time, datetime, shutil, re
from lib.configobj import ConfigObj
if os.path.isfile('Cleaner_Script_Config.cfg')== True:
configreader = ConfigObj('Cleaner_Script_Config.cfg')
How_Old_To_Delete_In_Days = configreader.get('How_Old_To_Delete_In_Days')
How_Old_To_Zip_In_Days = configreader.get('How_Old_To_Zip_In_Days')
Folders_To_Clean = os.path.join(configreader.get('Folders_To_Clean'))
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean_List= Folders_To_Clean.split(' ')
else:
How_Old_To_Delete_In_Days = int('90')
How_Old_To_Zip_In_Days = int('7')
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean = '/home/test/betfair-ftpdata /home/test/betfair-ftpdata2'
Folders_To_Clean_List= Folders_To_Clean.split(' ')
def XML_Cleaner(extension, folders):
for folder in folders:
print('Running XML cleaner function')
Files_To_Clean = []
To_Delete_Time = (int(time.time()) - int(How_Old_To_Delete_In_Seconds))
To_Zip_Time = (int(time.time()) - int(How_Old_To_Zip_In_Seconds))
print ('All files older than: %s are to be deleted' % datetime.datetime.fromtimestamp(int(To_Delete_Time)).strftime('%Y-%m-%d %H:%M:%S'))
print ('All files older than: %s are to be ziped' % datetime.datetime.fromtimestamp(int(To_Zip_Time)).st
A little more information: It reads a config file for the times to delete/zip using Simple Config Parser, a library.
I am a temp at a networking company and this is one of the first pieces of "useful" code I've written.
```
import os, fnmatch, time, datetime, shutil, re
from lib.configobj import ConfigObj
if os.path.isfile('Cleaner_Script_Config.cfg')== True:
configreader = ConfigObj('Cleaner_Script_Config.cfg')
How_Old_To_Delete_In_Days = configreader.get('How_Old_To_Delete_In_Days')
How_Old_To_Zip_In_Days = configreader.get('How_Old_To_Zip_In_Days')
Folders_To_Clean = os.path.join(configreader.get('Folders_To_Clean'))
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean_List= Folders_To_Clean.split(' ')
else:
How_Old_To_Delete_In_Days = int('90')
How_Old_To_Zip_In_Days = int('7')
How_Old_To_Delete_In_Seconds = int(How_Old_To_Delete_In_Days)2460*60
How_Old_To_Zip_In_Seconds = int(How_Old_To_Zip_In_Days)2460*60
Folders_To_Clean = '/home/test/betfair-ftpdata /home/test/betfair-ftpdata2'
Folders_To_Clean_List= Folders_To_Clean.split(' ')
def XML_Cleaner(extension, folders):
for folder in folders:
print('Running XML cleaner function')
Files_To_Clean = []
To_Delete_Time = (int(time.time()) - int(How_Old_To_Delete_In_Seconds))
To_Zip_Time = (int(time.time()) - int(How_Old_To_Zip_In_Seconds))
print ('All files older than: %s are to be deleted' % datetime.datetime.fromtimestamp(int(To_Delete_Time)).strftime('%Y-%m-%d %H:%M:%S'))
print ('All files older than: %s are to be ziped' % datetime.datetime.fromtimestamp(int(To_Zip_Time)).st
Solution
Style
PEP8
PEP8 is python style guide. It says how you should style python code.
-
Lines should be a maximum of 79 characters.
The exception to this is comments at 72.
-
Variables should be in
The exception to this is classes with
And Constants with
-
Imports should be on separate lines.
-
Operators should have one space either side of them.
The exception to this is to show precedence.
-
When comparing to singletons,
-
When checking if something is true, don't compare to
-
Do not overwrite builtins,
You have a good style otherwise. It's nice seeing long variable names, as they are much better than
Code
String formatting
Currently you use both the
You should use
Getting the config
You may want to make things have absolute paths.
If you call the file from a different directory, it may not work, as it will not be able to find the needed files.
This is as you may be in 'mydocuments', but the python file might be in 'programs'.
The program will work relatively from 'mydocuments', and not 'programs'.
And so you need to get a path for the settings.
This with your
In python it is very uncommon to do
You call
I would change it so that you have
The way this will work is rather than handling everything for later, you handle it now.
This has the pro of not building another list, which can be expensive later.
It also means that you can handle ZIP files too.
However it means that you don't build a list of paths.
Next we need to make it so there is a time that you will delete from.
In the global scope you can add this.
This will make it so there is only one time that it will delete from. And will report it once.
As we are handling each file one at a time we will have to split the remaining of
One that cleans up the zip files, and another that will handle the single path passed to it.
This means that after we have completed
PEP8
PEP8 is python style guide. It says how you should style python code.
-
Lines should be a maximum of 79 characters.
The exception to this is comments at 72.
-
Variables should be in
snake_case. NotCapitalized_Words_With_Underscores (ugly!)The exception to this is classes with
CamelCase.And Constants with
UPPER_SNAKE_CASE.-
Imports should be on separate lines.
-
Operators should have one space either side of them.
1 + 1.The exception to this is to show precedence.
2*2 + 1.-
When comparing to singletons,
True; None, use this is keyword.-
When checking if something is true, don't compare to
True.-
Do not overwrite builtins,
file, use a synonym, path, or add a trailing underscore, file_. The former is better.You have a good style otherwise. It's nice seeing long variable names, as they are much better than
a.Code
String formatting
Currently you use both the
+ and % operators to add data to strings.You should use
str.format instead.'File %s is to delete!' % file
# To
'File {} is to delete!'.format(file)
os.rename(
str(madefolder) + '.zip',
str(folder) + str(os.sep) + str(madefolder) + '.zip')
# To
os.rename(
'{!s}.zip'.format(makefolder),
'{!s}{!s}{!s}.zip'.format(folder, os.sep, makefolder))Getting the config
You may want to make things have absolute paths.
If you call the file from a different directory, it may not work, as it will not be able to find the needed files.
This is as you may be in 'mydocuments', but the python file might be in 'programs'.
The program will work relatively from 'mydocuments', and not 'programs'.
And so you need to get a path for the settings.
file_path = os.path.dirname(__file__)This with your
os.path.isfile will mean that you will always open the correct settings.if os.path.isfile(os.path.join(file_path, 'Cleaner_Script_Config.cfg')):
# Do other stuffIn python it is very uncommon to do
int('90'). Just do 90.XML_cleanerYou call
XML_cleaner twice, with the 'same' extension. This makes it so that you have to os.walk more than needed.I would change it so that you have
os.walk as a separate function. Then you can pass in extensions, and how to handle them.def walk_through_folders(folders, handlers):
for folder in folders:
for root, dirs, files in os.walk(folder):
for name in files:
extension = os.path.splitext(name)[1].lower()
if extension in handlers:
handlers[extension](os.path.join(root, name))
walk_through_folders(folders, {'.xml': XML_cleaner, '.zip': ZIP_cleaner})The way this will work is rather than handling everything for later, you handle it now.
This has the pro of not building another list, which can be expensive later.
It also means that you can handle ZIP files too.
However it means that you don't build a list of paths.
Next we need to make it so there is a time that you will delete from.
In the global scope you can add this.
This will make it so there is only one time that it will delete from. And will report it once.
def datetime_strftime(time, format_):
return datetime.datetime.fromtimestamp(time).strftime(format_)
TO_DELETE_TIME = time.time() - How_Old_To_Delete_In_Seconds
TO_ZIP_TIME = time.time() - How_Old_To_Zip_In_Seconds
print('All files older than: {} are to be deleted.'.format(
datetime_strftime(to_delete_time, '%Y-%m-%d %H:%M:%S')
))
print('All files older than: {} are to be ziped.'.format(
datetime_strftime(to_zip_time, '%Y-%m-%d %H:%M:%S')
))As we are handling each file one at a time we will have to split the remaining of
XML_cleaner into two functions.One that cleans up the zip files, and another that will handle the single path passed to it.
This means that after we have completed
walk_through_folders, we will need to call zip_files().def XML_cleaner(path):
info = os.stat(path)
day_folder = datetime.datetime.fromtimestamp(info.st_mtime) \
.strftime('%Y-%m-%d')
if not os.path.exists(day_folder):
os.makedirs(day_folder)
if info.st_mtime < TO_DELETE_TIME:
print ('Deleting: {!r}'.format(path))
os.remove(path)
elif info.st_mtime < TO_ZIP_TIME:
print ('Ziping: {!r}'.format(path))
infolder = (day_folder +
os.sep +
os.path.basename(os.path.normpath(path)))
os.rename(path, infolder)
def zip_folders(folders):
reobj = re.compile('\d{4}-\d{2}-\d{2}')
for folder in folders:
for root, subs, files in os.walk('.'):
for name in subs:
if reobj.match(name):
shutil.make_archive(name, 'zip', name)
shutil.rmtree(name)
os.rename(
'{!s}.zip'.format(name),
'{!s}{!s}{!s}.zip'.format(folder, os.sep, name))ZIP_cleanerCode Snippets
'File %s is to delete!' % file
# To
'File {} is to delete!'.format(file)
os.rename(
str(madefolder) + '.zip',
str(folder) + str(os.sep) + str(madefolder) + '.zip')
# To
os.rename(
'{!s}.zip'.format(makefolder),
'{!s}{!s}{!s}.zip'.format(folder, os.sep, makefolder))file_path = os.path.dirname(__file__)if os.path.isfile(os.path.join(file_path, 'Cleaner_Script_Config.cfg')):
# Do other stuffdef walk_through_folders(folders, handlers):
for folder in folders:
for root, dirs, files in os.walk(folder):
for name in files:
extension = os.path.splitext(name)[1].lower()
if extension in handlers:
handlers[extension](os.path.join(root, name))
walk_through_folders(folders, {'.xml': XML_cleaner, '.zip': ZIP_cleaner})def datetime_strftime(time, format_):
return datetime.datetime.fromtimestamp(time).strftime(format_)
TO_DELETE_TIME = time.time() - How_Old_To_Delete_In_Seconds
TO_ZIP_TIME = time.time() - How_Old_To_Zip_In_Seconds
print('All files older than: {} are to be deleted.'.format(
datetime_strftime(to_delete_time, '%Y-%m-%d %H:%M:%S')
))
print('All files older than: {} are to be ziped.'.format(
datetime_strftime(to_zip_time, '%Y-%m-%d %H:%M:%S')
))Context
StackExchange Code Review Q#96984, answer score: 4
Revisions (0)
No revisions yet.