HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Copying jpg files between two folders

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
folderstwobetweenfilescopyingjpg

Problem


  • I have a two directories, input and output.



-
input is a flat directory containing, among others, some .jpg files.

-
output has
nested subdirectories and contains .jpg files with same names as these in input.

-
There can be some names in input missing in output.

-
Names in output can be duplicated in different subdirectories.

An example structure:

$ tree input/
input/
├── a
├── b
├── c
├── d
├── e.jpg
├── f.jpg
├── h
├── i
├── j.jpg
└── z.jpg

0 directories, 10 files
$ tree output
output
├── 1
│   └── 2
│       └── 3
│           └── e.jpg
├── A
│   └── B
│       └── f.jpg
├── O
│   └── j.jpg
└── X
    └── Y
        └── Z
            └── j.jpg

9 directories, 4 files


The task is to overwrite all .jpg files in output directory with these from input based on their names.

Python version:

#!/usr/bin/env python
# -- coding: utf-8 --

import collections
import os
import shutil
import sys

def get_paths(root):
paths = collections.defaultdict(list)
for path, subdirs, files in os.walk(root):
for f in files:
if os.path.splitext(f)[1] == '.jpg':
paths[os.path.basename(f)].append(os.path.join(path, f))
return paths

def main():
if len(sys.argv) != 3:
msg = 'python {} path/to/input/dir path/to/output/dir'
print(msg.format(sys.argv[0]))
sys.exit(1)

input_dir, output_dir = sys.argv[1], sys.argv[2]
input_paths, output_paths = get_paths(input_dir), get_paths(output_dir)

for filename, input_path in input_paths.items():
for output_path in output_paths.get(filename, []):
shutil.copy(input_path[0], output_path)

if __name__ == '__main__':
main()


Usage:

$ cat input/e.jpg input/f.jpg input/j.jpg 
e
f
j
$ cat output/1/2/3/e.jpg output/A/B/f.jpg output/X/Y/Z/j.jpg output/O/j.jpg 
X
X
X
X
$ python test.py input/ output/
$ cat output/1/2/3/e.jpg output/A/B/f.jpg output/X/Y/Z/j.jpg output/O/j.jpg 
e
f
j
j


C++14 versi

Solution

I can only comment on the python half. Broadly speaking, I'd say this is excellent already. Your code is clean, readable and uses standard libraries well. Your algorithm is reasonable, and will scale pretty well. Consider all of the below tiny nits.

if os.path.splitext(f)[1] == '.jpg':
    paths[os.path.basename(f)].append(os.path.join(path, f))


os.path.basename(f) is always just f here, because you're iterating over a flat file list. For the extension, I've seen ".jpeg", ".JPG", etc. You may want to take a second look at this whole program thinking carefully about case-sensitivity.

To make this more readable, I might rewrite this with intermediate variable names, but it's a matter of taste. _ is a traditional python name for "we don't care about this variable and will never use it".

for dir, subdirs, files in os.walk(root):
    for f in files:
        _, ext = os.path.splitext(f)
        file_path = os.path.join(dir, f)
        paths[f].append(file_path)


I would make

msg = 'python {} path/to/input/dir path/to/output/dir'


slightly more descriptive:

USAGE = '{program_name} path/to/input/dir path/to/output/dir'


You can chmod +x this program and run it without 'python' as the first argument, on Mac and Linux and least (not sure about Windows).

input_dir, output_dir = sys.argv[1], sys.argv[2]


might be clearly written to remind that it's an exhaustive list as:

_, input_dir, output_dir = sys.argv


Because you're using defaultdict,

for output_path in output_paths.get(filename, []):


can be just

for output_path in output_paths[filename]:


And finally, I'll point out that you're collecting all files of a given name in the input path, but only using the first. That seems fine--I think I prefer readability over efficiency. But perhaps you should be checking that the content is actually identical for the input files?

Code Snippets

if os.path.splitext(f)[1] == '.jpg':
    paths[os.path.basename(f)].append(os.path.join(path, f))
for dir, subdirs, files in os.walk(root):
    for f in files:
        _, ext = os.path.splitext(f)
        file_path = os.path.join(dir, f)
        paths[f].append(file_path)
msg = 'python {} path/to/input/dir path/to/output/dir'
USAGE = '{program_name} path/to/input/dir path/to/output/dir'
input_dir, output_dir = sys.argv[1], sys.argv[2]

Context

StackExchange Code Review Q#153080, answer score: 4

Revisions (0)

No revisions yet.