HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Track changes inside a directory

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
directorychangestrackinside

Problem

I have built a Python 2.7 script to track all file and subdir changes inside a nominated directory. It is used with directories that have multiple levels of subdirectories, hundreds of thousands of files, and hundreds of GB of file data. The filenames can have Unicode characters (encoded in UTF-8). By "changes" I mean additions/deletions of files and subdirs, or changes in filesizes (i.e., we are not concerned with the content of the files). The tracking is not continuous, we are just comparing to the last time we checked (typically checking twice a day). The script works fine to the best of my knowledge.

I would gladly receive feedback on any aspect of the script including best coding practices and design pattern usage, handling of unexpected cases, and performance.

I include the whole script here. It's 310 lines long and I am wondering whether this might be too long as a question body, but I could not find size guidelines on the site. I opted to include everything instead of code snippets since this seems to be the recommended practice here. I also recognise that the width of my lines does not offer the best viewing opportunity inside the code box (which seems to fit 93 char lines). I normally use 120-char vertical rulers in my code, and sometimes I allow lines to go past them. I am not sure if I should modify my code to offer a better viewing chance here. Let me know if reading the code here is too annoying, and I'll wrap-line it.

You can find the code, with more backstory, details, and other pieces of code that help run the tool as a background agent here: https://github.com/boulis/Track-Dir-Changes

```
import json, subprocess
from argparse import ArgumentParser
from os import walk
from os.path import join, getsize
from datetime import datetime

parser = ArgumentParser(description="Tracks any changes in a specified directory. Additions, deletions,\n \
changes of files and subdirs are tracked and recorded in a log file.\

Solution

You've done a great job documenting your code.

Here are a few code style and code organization things I would work on:

  • read arguments either inside the main_loop() function, or inside the if __name__ == '__main__': block of the code. This way, if your script would be imported, arguments would not be parsed. Also, consider having a separate parse_args() function that would be responsible for parsing arguments. This may lead to better modularity and testability



  • docstrings should go right after the class, method or function definitions; should be enclosed into triple double-quotes, start with a capital letter and end with a dot (PEP8 reference)



  • naming; follow the lower_case_with_underscores naming recommendations



  • properly organize your imports



  • remove extra trailing semi-colons



  • the humanReadableSize() method should not be under the Tracker class - it feels like this is more of a helper/utility type of a function - consider extracting it into a separate "libs"/"utils" module



  • you may remove redundant parenthesis replacing return (total_size, total_num) with return total_size, total_num and return (0, 0) with return 0, 0



-
the unicode type check can be done with isinstance():

if not isinstance(pathname, unicode):


  • the emptiness check can be simplified to if not self.previous_state:

Code Snippets

if not isinstance(pathname, unicode):

Context

StackExchange Code Review Q#160809, answer score: 4

Revisions (0)

No revisions yet.