patternpythonModerate
Let's speed that file sentence searching program
Viewed 0 times
filesearchingprogramthatletsentencespeed
Problem
Intro:
I've written a small piece of Python program which is looking for a given sentence in multiple sub directories of a given path.
I'm looking for improvements regarding the speed of my script.
Code:
Miscellaneous:
As you may already figured out, the reason for my program being so slow resides in
I've written a small piece of Python program which is looking for a given sentence in multiple sub directories of a given path.
I'm looking for improvements regarding the speed of my script.
Code:
from os import walk
from os.path import join
def get_magik_files(base_path):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where we'll start looking after .magik files
:return: yield full path of a .magik file
"""
for dirpath, _, filenames in walk(base_path):
for filename in [f for f in filenames if f.endswith(".magik")]:
yield join(dirpath, filename)
def search_sentence_in_file(base_path, sentence):
"""
Prints each file path, line and line content where sentence was found
:param base_path: this is the base path from where we'll start looking after .magik files
:param sentence: the sentence we're looking up for
:return: print the file path, line number and line content where sentence was found
"""
for each_magik_file in get_magik_files(base_path):
with open(each_magik_file) as magik_file:
for line_number, line in enumerate(magik_file):
if sentence in line:
print('[# FILE PATH #] {} ...\n'
'[# LINE NUMBER #] At line {}\n'
'[# LINE CONTENT #] Content: {}'.format(each_magik_file, line_number, line.strip()))
print('---------------------------------------------------------------------------------')
def main():
basepath = r'some_path'
sentence_to_search = 'some sentence'
search_sentence_in_file(basepath, sentence_to_search)
if __name__ == '__main__':
main()Miscellaneous:
As you may already figured out, the reason for my program being so slow resides in
search_sentence_in_file(base_path, sentence) where I need to open each file, read it line by line and look for a specific seSolution
Yay, PEP 8
72 characters for docstrings, 79 for the code. The rest seems fine.
Separation of concerns
I feel it is also wrongly named as it search a sentence in several files. So at least add the missing
Genericity
Besides
First rewrite
Reusability
Your script make it hard to reuse for other purposes: different sentences, different kind of files. Better to add a CLI using
```
from os import walk
from os.path import join, splitext
import argparse
def get_files(base_path, extension=None):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where the
function start looking for relevant files
:param extension: filter files using provided extension.
If None, no filter is applied.
:return: yield full path of a requested file
"""
if extension is None:
def filter_files(filenames):
yield from filenames
else:
def filter_files(filenames):
for filename in filenames:
if splitext(filename)[1] == extension:
yield filename
for dirpath, _, filenames in walk(base_path):
for filename in filter_files(filenames):
yield join(dirpath, filename)
def search_sentence_in_files(files, sentence):
"""
Yield each file path, line and line content where
sentence was found.
:param files: iterable of files to search the sentence into
:param sentence: the sentence we're looking up for
:return: yield the file path, line number and line
content where sentence was found
"""
for filepath in files:
with open(filepath) as fp:
for line_number, line in enumerate(fp):
if sentence in line:
yield filepath, line_number, line.strip()
def main(files, sentence):
results = search_sentence_in_files(files, sentence)
for filepath, line, content in results:
print('[# FILE PATH #]', filepath, '...')
print('[# LINE NUMBER #] At line', line)
print('[# LINE CONTENT #] Content:', content)
print('-'*80)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Search text in files')
parser.add_argument('sentence')
parser.add_argument('-p', '--basepath',
help='folder in wich files will be examinated',
default=r'some folder')
parser.add_argument('-e', '--extension',
72 characters for docstrings, 79 for the code. The rest seems fine.
Separation of concerns
search_sentence_in_file should search, and return its results. Not print, it is the duty of the caller.I feel it is also wrongly named as it search a sentence in several files. So at least add the missing
s at the end of the name. And to make it even more reusable, why not pass an iterable of filepath (like the get_magic_files generator)?Genericity
Besides
search_sentence_in_file accepting an iterable, you could make get_magik_files more generic by passing the required extension as a parameter. This will let you extend your script to allow search in various kind of files.First rewrite
from os import walk
from os.path import join, splitext
def get_files(base_path, extension=None):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where the
function start looking for relevant files
:param extension: filter files using provided extension.
If None, no filter is applied.
:return: yield full path of a requested file
"""
if extension is None:
def filter_files(filenames):
yield from filenames
else:
def filter_files(filenames):
for filename in filenames:
if splitext(filename)[1] == extension:
yield filename
for dirpath, _, filenames in walk(base_path):
for filename in filter_files(filenames):
yield join(dirpath, filename)
def search_sentence_in_files(files, sentence):
"""
Yield each file path, line and line content where
sentence was found.
:param files: iterable of files to search the sentence into
:param sentence: the sentence we're looking up for
:return: yield the file path, line number and line
content where sentence was found
"""
for filepath in files:
with open(filepath) as fp:
for line_number, line in enumerate(fp):
if sentence in line:
yield filepath, line_number, line.strip()
def main():
basepath = r'some_path'
sentence_to_search = 'some sentence'
files = get_files(basepath, 'magik')
results = search_sentence_in_files(files, sentence_to_search)
for filepath, line, content in results:
print('[# FILE PATH #]', filepath, '...')
print('[# LINE NUMBER #] At line', line)
print('[# LINE CONTENT #] Content:', content)
print('-'*80)
if __name__ == '__main__':
main()Reusability
Your script make it hard to reuse for other purposes: different sentences, different kind of files. Better to add a CLI using
argparse. Provide sensible default for your current usage but allows for customization at will.```
from os import walk
from os.path import join, splitext
import argparse
def get_files(base_path, extension=None):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where the
function start looking for relevant files
:param extension: filter files using provided extension.
If None, no filter is applied.
:return: yield full path of a requested file
"""
if extension is None:
def filter_files(filenames):
yield from filenames
else:
def filter_files(filenames):
for filename in filenames:
if splitext(filename)[1] == extension:
yield filename
for dirpath, _, filenames in walk(base_path):
for filename in filter_files(filenames):
yield join(dirpath, filename)
def search_sentence_in_files(files, sentence):
"""
Yield each file path, line and line content where
sentence was found.
:param files: iterable of files to search the sentence into
:param sentence: the sentence we're looking up for
:return: yield the file path, line number and line
content where sentence was found
"""
for filepath in files:
with open(filepath) as fp:
for line_number, line in enumerate(fp):
if sentence in line:
yield filepath, line_number, line.strip()
def main(files, sentence):
results = search_sentence_in_files(files, sentence)
for filepath, line, content in results:
print('[# FILE PATH #]', filepath, '...')
print('[# LINE NUMBER #] At line', line)
print('[# LINE CONTENT #] Content:', content)
print('-'*80)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Search text in files')
parser.add_argument('sentence')
parser.add_argument('-p', '--basepath',
help='folder in wich files will be examinated',
default=r'some folder')
parser.add_argument('-e', '--extension',
Code Snippets
from os import walk
from os.path import join, splitext
def get_files(base_path, extension=None):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where the
function start looking for relevant files
:param extension: filter files using provided extension.
If None, no filter is applied.
:return: yield full path of a requested file
"""
if extension is None:
def filter_files(filenames):
yield from filenames
else:
def filter_files(filenames):
for filename in filenames:
if splitext(filename)[1] == extension:
yield filename
for dirpath, _, filenames in walk(base_path):
for filename in filter_files(filenames):
yield join(dirpath, filename)
def search_sentence_in_files(files, sentence):
"""
Yield each file path, line and line content where
sentence was found.
:param files: iterable of files to search the sentence into
:param sentence: the sentence we're looking up for
:return: yield the file path, line number and line
content where sentence was found
"""
for filepath in files:
with open(filepath) as fp:
for line_number, line in enumerate(fp):
if sentence in line:
yield filepath, line_number, line.strip()
def main():
basepath = r'some_path'
sentence_to_search = 'some sentence'
files = get_files(basepath, 'magik')
results = search_sentence_in_files(files, sentence_to_search)
for filepath, line, content in results:
print('[# FILE PATH #]', filepath, '...')
print('[# LINE NUMBER #] At line', line)
print('[# LINE CONTENT #] Content:', content)
print('-'*80)
if __name__ == '__main__':
main()from os import walk
from os.path import join, splitext
import argparse
def get_files(base_path, extension=None):
"""
Yields each path from all the base_path subdirectories
:param base_path: this is the base path from where the
function start looking for relevant files
:param extension: filter files using provided extension.
If None, no filter is applied.
:return: yield full path of a requested file
"""
if extension is None:
def filter_files(filenames):
yield from filenames
else:
def filter_files(filenames):
for filename in filenames:
if splitext(filename)[1] == extension:
yield filename
for dirpath, _, filenames in walk(base_path):
for filename in filter_files(filenames):
yield join(dirpath, filename)
def search_sentence_in_files(files, sentence):
"""
Yield each file path, line and line content where
sentence was found.
:param files: iterable of files to search the sentence into
:param sentence: the sentence we're looking up for
:return: yield the file path, line number and line
content where sentence was found
"""
for filepath in files:
with open(filepath) as fp:
for line_number, line in enumerate(fp):
if sentence in line:
yield filepath, line_number, line.strip()
def main(files, sentence):
results = search_sentence_in_files(files, sentence)
for filepath, line, content in results:
print('[# FILE PATH #]', filepath, '...')
print('[# LINE NUMBER #] At line', line)
print('[# LINE CONTENT #] Content:', content)
print('-'*80)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Search text in files')
parser.add_argument('sentence')
parser.add_argument('-p', '--basepath',
help='folder in wich files will be examinated',
default=r'some folder')
parser.add_argument('-e', '--extension',
help='extension of files to examine',
default='magik')
args = parser.parse_args()
files = get_files(args.basepath, args.extension)
main(files, args.sentence)Context
StackExchange Code Review Q#150571, answer score: 12
Revisions (0)
No revisions yet.