patternpythonMinor
Track changes inside a directory
Viewed 0 times
directorychangestrackinside
Problem
I have built a Python 2.7 script to track all file and subdir changes inside a nominated directory. It is used with directories that have multiple levels of subdirectories, hundreds of thousands of files, and hundreds of GB of file data. The filenames can have Unicode characters (encoded in UTF-8). By "changes" I mean additions/deletions of files and subdirs, or changes in filesizes (i.e., we are not concerned with the content of the files). The tracking is not continuous, we are just comparing to the last time we checked (typically checking twice a day). The script works fine to the best of my knowledge.
I would gladly receive feedback on any aspect of the script including best coding practices and design pattern usage, handling of unexpected cases, and performance.
I include the whole script here. It's 310 lines long and I am wondering whether this might be too long as a question body, but I could not find size guidelines on the site. I opted to include everything instead of code snippets since this seems to be the recommended practice here. I also recognise that the width of my lines does not offer the best viewing opportunity inside the code box (which seems to fit 93 char lines). I normally use 120-char vertical rulers in my code, and sometimes I allow lines to go past them. I am not sure if I should modify my code to offer a better viewing chance here. Let me know if reading the code here is too annoying, and I'll wrap-line it.
You can find the code, with more backstory, details, and other pieces of code that help run the tool as a background agent here: https://github.com/boulis/Track-Dir-Changes
```
import json, subprocess
from argparse import ArgumentParser
from os import walk
from os.path import join, getsize
from datetime import datetime
parser = ArgumentParser(description="Tracks any changes in a specified directory. Additions, deletions,\n \
changes of files and subdirs are tracked and recorded in a log file.\
I would gladly receive feedback on any aspect of the script including best coding practices and design pattern usage, handling of unexpected cases, and performance.
I include the whole script here. It's 310 lines long and I am wondering whether this might be too long as a question body, but I could not find size guidelines on the site. I opted to include everything instead of code snippets since this seems to be the recommended practice here. I also recognise that the width of my lines does not offer the best viewing opportunity inside the code box (which seems to fit 93 char lines). I normally use 120-char vertical rulers in my code, and sometimes I allow lines to go past them. I am not sure if I should modify my code to offer a better viewing chance here. Let me know if reading the code here is too annoying, and I'll wrap-line it.
You can find the code, with more backstory, details, and other pieces of code that help run the tool as a background agent here: https://github.com/boulis/Track-Dir-Changes
```
import json, subprocess
from argparse import ArgumentParser
from os import walk
from os.path import join, getsize
from datetime import datetime
parser = ArgumentParser(description="Tracks any changes in a specified directory. Additions, deletions,\n \
changes of files and subdirs are tracked and recorded in a log file.\
Solution
You've done a great job documenting your code.
Here are a few code style and code organization things I would work on:
-
the unicode type check can be done with
Here are a few code style and code organization things I would work on:
- read arguments either inside the
main_loop()function, or inside theif __name__ == '__main__':block of the code. This way, if your script would be imported, arguments would not be parsed. Also, consider having a separateparse_args()function that would be responsible for parsing arguments. This may lead to better modularity and testability
- docstrings should go right after the class, method or function definitions; should be enclosed into triple double-quotes, start with a capital letter and end with a dot (PEP8 reference)
- naming; follow the
lower_case_with_underscoresnaming recommendations
- properly organize your imports
- remove extra trailing semi-colons
- the
humanReadableSize()method should not be under theTrackerclass - it feels like this is more of a helper/utility type of a function - consider extracting it into a separate "libs"/"utils" module
- you may remove redundant parenthesis replacing
return (total_size, total_num)withreturn total_size, total_numandreturn (0, 0)withreturn 0, 0
-
the unicode type check can be done with
isinstance():if not isinstance(pathname, unicode):- the emptiness check can be simplified to
if not self.previous_state:
Code Snippets
if not isinstance(pathname, unicode):Context
StackExchange Code Review Q#160809, answer score: 4
Revisions (0)
No revisions yet.