HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

'Learn projects the hard way': logfind project

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
projectsthehardwayprojectlearnlogfind

Problem

I'm a new programmer, who has just finished his first small project.

It's a sort of basic imitation of the grep command from Linux. I'm learning from projects the hard way. Here is the description of the project:


The first project that we’ll try is called logfind. It’s nothing more than a simple version of grep so you can learn a few simple basic things in Python:



  • Opening files based on regular expressions in a file.



  • Scanning those files for strings from command line arguments.



  • Create a complete installable Python project.




When I run the script, the first thing it will look for islogfind.txt file on the user's computer, then it will scan this file according to a regular expression to search for filenames of the example.extension type. The files specified in the logfind.txt file will be searched for the strings that the user has specified on the command line.

I have a few questions about my program:

  • Could someone critique my code on style and readability?



  • Could you also check the overall design of the code and evaluate it?



  • Are there functions that's syntax can be shortened?



Source code:

```
# logfind.py
# Project explanation: http://projectsthehardway.com/
# A basic implementation of the linux grep command. The program will search for the given strings in files that are listed in a logfind.txt file.
# By default the program will search for all the given strings in a file. If -o option is enabled, the program will return the file if one of the strings is present.
# The results will be written to results.txt located in the current working directory.

import argparse
import os
import re

def cl_handler():
"""
Handles the command line input, with strings as the words to search for in files, and -o as optional argument
"""

parser = argparse.ArgumentParser(description="find strings inside files")
parser.add_argument("strings", nargs = "*", help="The files will be searched according to this w

Solution

Generators

Sometimes you make a list a piece at a time, to write a generator function you just yield the result each iteration and not build a list, the pattern:

result = []
for i in a_thing:
    result.append(do_thing(i))
return result


Must be avoided as it wastes memory and is boilerplate.

More concretely, you have:

def scan_logfind(logfind_dir):
    """
    Opens the logfind file and scans it for filenames according to a regular expression (filename.extension). 
    Returns a list with the filenames
    """
    files = [] # <- 1
    with open(logfind_dir, "r") as logfind:
        regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")  
        for word in logfind.read().split():            
            file = regex.match(word)
            if file:
                files.append(word) # <- 2
    return files # <- 3


You should write:

def scan_logfind(logfind_dir):
    """
    Opens the logfind file and scans it for filenames according to a regular expression (filename.extension). 
    Returns a generator with the filenames
    """
    ## Deleted -- files = [] # <- 1
    with open(logfind_dir, "r") as logfind:
        regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")  
        for word in logfind.read().split():            
            file = regex.match(word)
            if file:
                yield word # Gives this element out of the function.
    ## Deleted -- return files # <- 3


Ternary when sensible

Ternaries should not be overused, but can be a valuable tool to reduce code-bloat, see:

if string in logfile:
            results.append("True")
        else:
            results.append("False")


The action append is the same, only the value changes, a perfect job for ternaries:

results.append("True" if string in logfile else "False")


But the whole top part of the function search_strings should be a generator expression as noted above.

Temporary overuse

You really like temporary variables, I would make less use of them, especially if they got confusing names:

file = regex.match(word) # This is not a filename nor a filecontent


Just write:

if regex.match(word):
    yield word


Another example:

file_directory = os.path.join(root, f)
       return file_directory


This follows the:

result = computation(thing)
return result


Anti-pattern, just write:

return os.path.join(root, f)


A case analysis: search_strings

search_strings was the worst of your functions considering all the wheels that you reinvented there. Python has a ton of built-ins, please use them.

A first refactor was just:

  • Reducing nesting by closing the file early.



  • Using generator expressions.



  • Using the built-ins all and any



def search_strings(file_dir, strings, or_option=False):
    """
    Searches the file for the specified files. Returns boolean true if all strings are found in the file
    If the or_option is enabled the function will return boolean true if one string is found in the file.
    """
    with open(file_dir, "r") as f:
        logfile = f.read().lower()

    results = ("True" if string in logfile else "False" for string in strings)

    if or_option:
        return any(result == "True" for result in results)
    else:
        return all(result == "True" for result in results)


After:

  • Realizing you are using "True" instead of True for misterious reasons.



  • Using a ternary for more FP.



def search_strings(file_dir, strings, or_option=False):
    """
    Searches the file for the specified files. Returns boolean true if all strings are found in the file
    If the or_option is enabled the function will return boolean true if one string is found in the file.
    """
    with open(file_dir, "r") as f:
        logfile = f.read().lower()

    any_or_all = (any if or_option else all)
    return any_or_all(string in logfile for string in strings)


A FP purist would avoid all temporary variables and write it like:

def search_strings(file_dir, strings, or_option=False):
    """
    Searches the file for the specified files. Returns boolean true if all strings are found in the file
    If the or_option is enabled the function will return boolean true if one string is found in the file.
    """
    with open(file_dir, "r") as f:
        return (any if or_option else all)(string in f.read().lower() for string in strings)


The second and third versions are more or less readable to different people, choosing between them is subjective, but surely any of them is better than your version (3-7 lines vs 20 of yours and are simpler then it).

Code Snippets

result = []
for i in a_thing:
    result.append(do_thing(i))
return result
def scan_logfind(logfind_dir):
    """
    Opens the logfind file and scans it for filenames according to a regular expression (filename.extension). 
    Returns a list with the filenames
    """
    files = [] # <- 1
    with open(logfind_dir, "r") as logfind:
        regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")  
        for word in logfind.read().split():            
            file = regex.match(word)
            if file:
                files.append(word) # <- 2
    return files # <- 3
def scan_logfind(logfind_dir):
    """
    Opens the logfind file and scans it for filenames according to a regular expression (filename.extension). 
    Returns a generator with the filenames
    """
    ## Deleted -- files = [] # <- 1
    with open(logfind_dir, "r") as logfind:
        regex = re.compile(r"^[\w,\s-]+\.[A-Za-z]+$")  
        for word in logfind.read().split():            
            file = regex.match(word)
            if file:
                yield word # Gives this element out of the function.
    ## Deleted -- return files # <- 3
if string in logfile:
            results.append("True")
        else:
            results.append("False")
results.append("True" if string in logfile else "False")

Context

StackExchange Code Review Q#109815, answer score: 4

Revisions (0)

No revisions yet.