patternpythonMinor
Find files with content matching regex
Viewed 0 times
withfilesfindcontentregexmatching
Problem
Today I wanted to find a program that I wrote a while ago. I knew that it contained a certain regex, but I couldn't for the life of me remember the file name I saved it under. I knew I could use Windows search, but it takes more time then it would for me to write a Python program to do the same.
The main two things I use are
Some files still error when using
I kept the arguments simple, you pass a regex and a path. You can also pass any of the regex flags. So the following will search for 'metaclass' in any case, in the files below '
The code is fairly small and mostly just adds information to the parser. It also runs in both Python2 and Python3.
``
'instead of full Unicode matching. This is only '
'meaningful for Unicode patterns, and is ignored for '
The main two things I use are
os.walk and re, the former to traverse the entire directory tree, where the latter is to match the data. I also use codecs to allow me to read files with special characters. And finally I use argparse to get the input from the end user.Some files still error when using
codecs such as pngs or other raw data files, so I skip these.I kept the arguments simple, you pass a regex and a path. You can also pass any of the regex flags. So the following will search for 'metaclass' in any case, in the files below '
D:\data'.python search.py "metaclass" "D:\data" -iThe code is fairly small and mostly just adds information to the parser. It also runs in both Python2 and Python3.
``
import re
import codecs
import argparse
import operator
from os import walk
from os.path import join
# Add reduce to global scope for Python3
try:
from functools import reduce
except ImportError:
pass
# Descriptions are the same as Python's re descriptions
# https://docs.python.org/2.7/library/re.html#module-contents
# https://docs.python.org/3.5/library/re.html#module-contents
parser = argparse.ArgumentParser(description='Search file contense.')
parser.add_argument('regex', help='regex to search for')
parser.add_argument('path', help='path to root of recursive search')
parser.add_argument('-a', '--ascii', action="store_true",
help='(Python3 only) Make \w, \W, \b, \B, \d, '
'\D, \s and \S` perform ASCII-only matching ''instead of full Unicode matching. This is only '
'meaningful for Unicode patterns, and is ignored for '
Solution
Your
You have a typo in your description. It should be 'contents', not 'contense'.
Since
I think
From How do I re.search or re.match on a whole file without reading it all into memory?, you can use
try block for importing reduce is unnecessary. In Python 2, it is still in the functools module, but it is also in the __builtin__ module.You have a typo in your description. It should be 'contents', not 'contense'.
Since
ASCII is a Python3-only flag, you might want to account for that in get_args(). It really isn't very complicated. Just add:if args['ascii']:
try:
re.ASCII
except AttributeError:
parser.error("--ascii is compatible with Python 3 only")I think
get_args() is fine in how much it does. A regex of th(kl is invalid. Invalid arguments should be caught in the function that gets the arguments. I would, however, add a function that determines if a given regex is found in a file. That way get_files() could look like this:def get_files(path, regex):
return (name
for root, dirs, files in os.walk(path)
for name in files
if file_matches(file, regex)
)From How do I re.search or re.match on a whole file without reading it all into memory?, you can use
mmap.mmap to save on memory usage. Note that Python 3 requires a bytes regex when using that function.Code Snippets
if args['ascii']:
try:
re.ASCII
except AttributeError:
parser.error("--ascii is compatible with Python 3 only")def get_files(path, regex):
return (name
for root, dirs, files in os.walk(path)
for name in files
if file_matches(file, regex)
)Context
StackExchange Code Review Q#139423, answer score: 6
Revisions (0)
No revisions yet.