patternpythonMinor
'Learn projects the hard way': logfind project
Viewed 0 times
projectsthehardwayprojectlearnlogfind
Problem
I'm a new programmer, who has just finished his first small project.
It's a sort of basic imitation of the
The first project that we’ll try is called logfind. It’s nothing more than a simple version of grep so you can learn a few simple basic things in Python:
When I run the script, the first thing it will look for is
I have a few questions about my program:
Source code:
```
# logfind.py
# Project explanation: http://projectsthehardway.com/
# A basic implementation of the linux grep command. The program will search for the given strings in files that are listed in a logfind.txt file.
# By default the program will search for all the given strings in a file. If -o option is enabled, the program will return the file if one of the strings is present.
# The results will be written to results.txt located in the current working directory.
import argparse
import os
import re
def cl_handler():
"""
Handles the command line input, with strings as the words to search for in files, and -o as optional argument
"""
parser = argparse.ArgumentParser(description="find strings inside files")
parser.add_argument("strings", nargs = "*", help="The files will be searched according to this w
It's a sort of basic imitation of the
grep command from Linux. I'm learning from projects the hard way. Here is the description of the project:The first project that we’ll try is called logfind. It’s nothing more than a simple version of grep so you can learn a few simple basic things in Python:
- Opening files based on regular expressions in a file.
- Scanning those files for strings from command line arguments.
- Create a complete installable Python project.
When I run the script, the first thing it will look for is
logfind.txt file on the user's computer, then it will scan this file according to a regular expression to search for filenames of the example.extension type. The files specified in the logfind.txt file will be searched for the strings that the user has specified on the command line.I have a few questions about my program:
- Could someone critique my code on style and readability?
- Could you also check the overall design of the code and evaluate it?
- Are there functions that's syntax can be shortened?
Source code:
```
# logfind.py
# Project explanation: http://projectsthehardway.com/
# A basic implementation of the linux grep command. The program will search for the given strings in files that are listed in a logfind.txt file.
# By default the program will search for all the given strings in a file. If -o option is enabled, the program will return the file if one of the strings is present.
# The results will be written to results.txt located in the current working directory.
import argparse
import os
import re
def cl_handler():
"""
Handles the command line input, with strings as the words to search for in files, and -o as optional argument
"""
parser = argparse.ArgumentParser(description="find strings inside files")
parser.add_argument("strings", nargs = "*", help="The files will be searched according to this w
Solution
Generators
Sometimes you make a list a piece at a time, to write a generator function you just
Must be avoided as it wastes memory and is boilerplate.
More concretely, you have:
You should write:
Ternary when sensible
Ternaries should not be overused, but can be a valuable tool to reduce code-bloat, see:
The action
But the whole top part of the function
Temporary overuse
You really like temporary variables, I would make less use of them, especially if they got confusing names:
Just write:
Another example:
This follows the:
Anti-pattern, just write:
A case analysis:
A first refactor was just:
After:
A FP purist would avoid all temporary variables and write it like:
The second and third versions are more or less readable to different people, choosing between them is subjective, but surely any of them is better than your version (3-7 lines vs 20 of yours and are simpler then it).
Sometimes you make a list a piece at a time, to write a generator function you just
yield the result each iteration and not build a list, the pattern:result = []
for i in a_thing:
result.append(do_thing(i))
return resultMust be avoided as it wastes memory and is boilerplate.
More concretely, you have:
def scan_logfind(logfind_dir):
"""
Opens the logfind file and scans it for filenames according to a regular expression (filename.extension).
Returns a list with the filenames
"""
files = [] # <- 1
with open(logfind_dir, "r") as logfind:
regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")
for word in logfind.read().split():
file = regex.match(word)
if file:
files.append(word) # <- 2
return files # <- 3You should write:
def scan_logfind(logfind_dir):
"""
Opens the logfind file and scans it for filenames according to a regular expression (filename.extension).
Returns a generator with the filenames
"""
## Deleted -- files = [] # <- 1
with open(logfind_dir, "r") as logfind:
regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")
for word in logfind.read().split():
file = regex.match(word)
if file:
yield word # Gives this element out of the function.
## Deleted -- return files # <- 3Ternary when sensible
Ternaries should not be overused, but can be a valuable tool to reduce code-bloat, see:
if string in logfile:
results.append("True")
else:
results.append("False")The action
append is the same, only the value changes, a perfect job for ternaries:results.append("True" if string in logfile else "False")But the whole top part of the function
search_strings should be a generator expression as noted above.Temporary overuse
You really like temporary variables, I would make less use of them, especially if they got confusing names:
file = regex.match(word) # This is not a filename nor a filecontentJust write:
if regex.match(word):
yield wordAnother example:
file_directory = os.path.join(root, f)
return file_directoryThis follows the:
result = computation(thing)
return resultAnti-pattern, just write:
return os.path.join(root, f)A case analysis:
search_stringssearch_strings was the worst of your functions considering all the wheels that you reinvented there. Python has a ton of built-ins, please use them.A first refactor was just:
- Reducing nesting by closing the file early.
- Using generator expressions.
- Using the built-ins
allandany
def search_strings(file_dir, strings, or_option=False):
"""
Searches the file for the specified files. Returns boolean true if all strings are found in the file
If the or_option is enabled the function will return boolean true if one string is found in the file.
"""
with open(file_dir, "r") as f:
logfile = f.read().lower()
results = ("True" if string in logfile else "False" for string in strings)
if or_option:
return any(result == "True" for result in results)
else:
return all(result == "True" for result in results)After:
- Realizing you are using
"True"instead ofTruefor misterious reasons.
- Using a ternary for more FP.
def search_strings(file_dir, strings, or_option=False):
"""
Searches the file for the specified files. Returns boolean true if all strings are found in the file
If the or_option is enabled the function will return boolean true if one string is found in the file.
"""
with open(file_dir, "r") as f:
logfile = f.read().lower()
any_or_all = (any if or_option else all)
return any_or_all(string in logfile for string in strings)A FP purist would avoid all temporary variables and write it like:
def search_strings(file_dir, strings, or_option=False):
"""
Searches the file for the specified files. Returns boolean true if all strings are found in the file
If the or_option is enabled the function will return boolean true if one string is found in the file.
"""
with open(file_dir, "r") as f:
return (any if or_option else all)(string in f.read().lower() for string in strings)The second and third versions are more or less readable to different people, choosing between them is subjective, but surely any of them is better than your version (3-7 lines vs 20 of yours and are simpler then it).
Code Snippets
result = []
for i in a_thing:
result.append(do_thing(i))
return resultdef scan_logfind(logfind_dir):
"""
Opens the logfind file and scans it for filenames according to a regular expression (filename.extension).
Returns a list with the filenames
"""
files = [] # <- 1
with open(logfind_dir, "r") as logfind:
regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")
for word in logfind.read().split():
file = regex.match(word)
if file:
files.append(word) # <- 2
return files # <- 3def scan_logfind(logfind_dir):
"""
Opens the logfind file and scans it for filenames according to a regular expression (filename.extension).
Returns a generator with the filenames
"""
## Deleted -- files = [] # <- 1
with open(logfind_dir, "r") as logfind:
regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")
for word in logfind.read().split():
file = regex.match(word)
if file:
yield word # Gives this element out of the function.
## Deleted -- return files # <- 3if string in logfile:
results.append("True")
else:
results.append("False")results.append("True" if string in logfile else "False")Context
StackExchange Code Review Q#109815, answer score: 4
Revisions (0)
No revisions yet.