HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Looking through a directory tree for certain folders

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
directoryfolderslookingforthroughcertaintree

Problem

I've written a script that recursively goes through a directory tree and looks for folders containing folders with NOT INVOICED in the folder name. If it finds one it prints the name of the containing folder and the names of any folders inside it with NOT INVOICED in the folder name.

The output is:

FOLDER 1
    NOT INVOICED FOLDER 1
    NOT INVOICED FOLDER 2
FOLDER 3
    NOT INVOICED FOLDER 1


The idea is that FOLDER 1 contains a few other folders that don't have NOT INVOICED in their name, FOLDER 2 doesn't contain any folders with NOT INVOICED in their name, and so on.

It works, but the only problem is this is what it looks like:

import os
import re

rootDir = '.'

with open('names.txt', 'w') as f:
        for dirName, subdirList,fileList in os.walk(rootDir):
                for fname in subdirList:
                        if re.match("NOT INVOICED", fname):
                                f.write(dirName + '\n')
                                for i in subdirList:
                                        if re.match("NOT INVOICED", i):
                                                f.write('\t%s\n' % i)
                                        else:
                                                pass
                        else:
                                pass


There's got to be a way to write this without six indentations.

Solution

It looks like your script's output will look more like this:

FOLDER 1
    NOT INVOICED FOLDER 1
    NOT INVOICED FOLDER 2
FOLDER 1
    NOT INVOICED FOLDER 1
    NOT INVOICED FOLDER 2
FOLDER 3
    NOT INVOICED FOLDER 1


It happens because for each subdirectory which contains NOT INVOICED you'll print containing directory name and all subdirectories. It can be easily fixed by adding an extra break in the end of the first if.

Now to the question. First of all, PEP 8 recommends using 4 spaces per one ident, neither tabs, nor 8 spaces. Also, you are free to omit the else branch in your ifs - no need in pass and extra lines.

I would separate checking of whether the directory should be printed (with some subdirectories) and the actual printing - that would kill inner for loop. Moreover, it will also make it impossible to output a directory twice:

import os
import re

rootDir = '.'

with open('names.txt', 'w') as f:
    for dirName, subdirList,fileList in os.walk(rootDir):
        shouldPrint = False
        for fname in subdirList:
            if re.match("NOT INVOICED", fname):
                shouldPrint = True
        if shouldPrint:
            f.write(dirName + '\n')
            for i in subdirList:
                if re.match("NOT INVOICED", i):
                    f.write('\t%s\n' % i)


Next things that I'd do are:

  • Calculate not invoiced subdirectories just once with list-comprehension. That will remove redundant for+if pair and also lower the risk of misspelling one of regexes.



  • Replace fileList variable with wildcard as it's unused



  • Run pep8 tool (can be installed with pip install pep8) on the code to make sure it corresponds with PEP 8. It'll report some whitespace issues in the very first for loop.



  • Make sure that only only style of quotes is used across the code - ' or ".



And now we have:

import os
import re

rootDir = '.'

with open('names.txt', 'w') as f:
    for dirName, subdirList, _ in os.walk(rootDir):
        notInvoiced = [fname for fname
                       in subdirList
                       if re.match('NOT INVOICED', fname)]
        if notInvoiced:
            f.write(dirName + '\n')
            for i in notInvoiced:
                f.write('\t%s\n' % i)


Also, if you're Python 3 user, I'd recommend using str.format() instead of % operator - the latter is kind of deprecated.

Code Snippets

FOLDER 1
    NOT INVOICED FOLDER 1
    NOT INVOICED FOLDER 2
FOLDER 1
    NOT INVOICED FOLDER 1
    NOT INVOICED FOLDER 2
FOLDER 3
    NOT INVOICED FOLDER 1
import os
import re

rootDir = '.'

with open('names.txt', 'w') as f:
    for dirName, subdirList,fileList in os.walk(rootDir):
        shouldPrint = False
        for fname in subdirList:
            if re.match("NOT INVOICED", fname):
                shouldPrint = True
        if shouldPrint:
            f.write(dirName + '\n')
            for i in subdirList:
                if re.match("NOT INVOICED", i):
                    f.write('\t%s\n' % i)
import os
import re

rootDir = '.'

with open('names.txt', 'w') as f:
    for dirName, subdirList, _ in os.walk(rootDir):
        notInvoiced = [fname for fname
                       in subdirList
                       if re.match('NOT INVOICED', fname)]
        if notInvoiced:
            f.write(dirName + '\n')
            for i in notInvoiced:
                f.write('\t%s\n' % i)

Context

StackExchange Code Review Q#128365, answer score: 4

Revisions (0)

No revisions yet.