patternpythonMinor
Copying jpg files between two folders
Viewed 0 times
folderstwobetweenfilescopyingjpg
Problem
- I have a two directories,
inputandoutput.
-
input is a flat directory containing, among others, some .jpg files. -
output hasnested subdirectories and contains
.jpg files with same names as these in input. -
There can be some names in
input missing in output.-
Names in
output can be duplicated in different subdirectories.An example structure:
$ tree input/
input/
├── a
├── b
├── c
├── d
├── e.jpg
├── f.jpg
├── h
├── i
├── j.jpg
└── z.jpg
0 directories, 10 files
$ tree output
output
├── 1
│ └── 2
│ └── 3
│ └── e.jpg
├── A
│ └── B
│ └── f.jpg
├── O
│ └── j.jpg
└── X
└── Y
└── Z
└── j.jpg
9 directories, 4 filesThe task is to overwrite all
.jpg files in output directory with these from input based on their names.Python version:
#!/usr/bin/env python
# -- coding: utf-8 --
import collections
import os
import shutil
import sys
def get_paths(root):
paths = collections.defaultdict(list)
for path, subdirs, files in os.walk(root):
for f in files:
if os.path.splitext(f)[1] == '.jpg':
paths[os.path.basename(f)].append(os.path.join(path, f))
return paths
def main():
if len(sys.argv) != 3:
msg = 'python {} path/to/input/dir path/to/output/dir'
print(msg.format(sys.argv[0]))
sys.exit(1)
input_dir, output_dir = sys.argv[1], sys.argv[2]
input_paths, output_paths = get_paths(input_dir), get_paths(output_dir)
for filename, input_path in input_paths.items():
for output_path in output_paths.get(filename, []):
shutil.copy(input_path[0], output_path)
if __name__ == '__main__':
main()
Usage:
$ cat input/e.jpg input/f.jpg input/j.jpg
e
f
j
$ cat output/1/2/3/e.jpg output/A/B/f.jpg output/X/Y/Z/j.jpg output/O/j.jpg
X
X
X
X
$ python test.py input/ output/
$ cat output/1/2/3/e.jpg output/A/B/f.jpg output/X/Y/Z/j.jpg output/O/j.jpg
e
f
j
jC++14 versi
Solution
I can only comment on the python half. Broadly speaking, I'd say this is excellent already. Your code is clean, readable and uses standard libraries well. Your algorithm is reasonable, and will scale pretty well. Consider all of the below tiny nits.
To make this more readable, I might rewrite this with intermediate variable names, but it's a matter of taste.
I would make
slightly more descriptive:
You can chmod +x this program and run it without 'python' as the first argument, on Mac and Linux and least (not sure about Windows).
might be clearly written to remind that it's an exhaustive list as:
Because you're using defaultdict,
can be just
And finally, I'll point out that you're collecting all files of a given name in the input path, but only using the first. That seems fine--I think I prefer readability over efficiency. But perhaps you should be checking that the content is actually identical for the input files?
if os.path.splitext(f)[1] == '.jpg':
paths[os.path.basename(f)].append(os.path.join(path, f))os.path.basename(f) is always just f here, because you're iterating over a flat file list. For the extension, I've seen ".jpeg", ".JPG", etc. You may want to take a second look at this whole program thinking carefully about case-sensitivity.To make this more readable, I might rewrite this with intermediate variable names, but it's a matter of taste.
_ is a traditional python name for "we don't care about this variable and will never use it".for dir, subdirs, files in os.walk(root):
for f in files:
_, ext = os.path.splitext(f)
file_path = os.path.join(dir, f)
paths[f].append(file_path)I would make
msg = 'python {} path/to/input/dir path/to/output/dir'slightly more descriptive:
USAGE = '{program_name} path/to/input/dir path/to/output/dir'You can chmod +x this program and run it without 'python' as the first argument, on Mac and Linux and least (not sure about Windows).
input_dir, output_dir = sys.argv[1], sys.argv[2]might be clearly written to remind that it's an exhaustive list as:
_, input_dir, output_dir = sys.argvBecause you're using defaultdict,
for output_path in output_paths.get(filename, []):can be just
for output_path in output_paths[filename]:And finally, I'll point out that you're collecting all files of a given name in the input path, but only using the first. That seems fine--I think I prefer readability over efficiency. But perhaps you should be checking that the content is actually identical for the input files?
Code Snippets
if os.path.splitext(f)[1] == '.jpg':
paths[os.path.basename(f)].append(os.path.join(path, f))for dir, subdirs, files in os.walk(root):
for f in files:
_, ext = os.path.splitext(f)
file_path = os.path.join(dir, f)
paths[f].append(file_path)msg = 'python {} path/to/input/dir path/to/output/dir'USAGE = '{program_name} path/to/input/dir path/to/output/dir'input_dir, output_dir = sys.argv[1], sys.argv[2]Context
StackExchange Code Review Q#153080, answer score: 4
Revisions (0)
No revisions yet.