patternpythonMinor
Python script to delete sections of text
Viewed 0 times
scriptdeletetextpythonsections
Problem
I've been writing a python script to help me clear up some files that contain outdated code, while also serving as a way to learn python. These files have sections of code surrounded with tags on their own lines, I need it to remove both the tags and everything within. While this largely applies to XML documents I am also using this script for several other file types, so XML specific solutions aren't suitable.
I have a working script already but it doesn't look like the most elegant or efficient solution by far, so I would like to see how I can improve it.
Script is called as python
This is the part of the script that checks each line for the start of the section to be deleted (
As requested, and example file input would loo
I have a working script already but it doesn't look like the most elegant or efficient solution by far, so I would like to see how I can improve it.
Script is called as python
cleanup.py startDeleteTag stopDeleteTag where the last two arguments are the locations where the code should be deleted and stop being deleted respectfully.import os, sys
def clearFile(path, beginDelete, stopDelete):
path = os.path.join(path, "fileName.xml")
input = open(path, "r")
lines = input.readlines()
input.close()
currentFlag = False
nextFlag = False
output = open(path, "w")
for line in lines:
if nextFlag == True:
nextFlag = False
deleteFlag = False
if beginDelete in line:
deleteFlag = True
elif stopDelete in line:
nextFlag = True
if deleteFlag == False:
output.write(line)
thisDirectory = os.getcwd()
for start, dirs, files in os.walk(thisDirectory):
for f in files:
if f == "fileName.xml":
clearFile(thisDirectory, sys.argv[1], sys.argv[2])This is the part of the script that checks each line for the start of the section to be deleted (
beginDelete to stopDelete). Because this would leave the end tag in the file I have to use a couple of booleans to specify when the current line needs to be removed but the next line shouldn't be. As an extra, is there a way to improve this to further check for tags that may be on the same line as body of the code? This isn't currently an issue but I'm curious how I would do it if I needed to in future.As requested, and example file input would loo
Solution
Program flow
You have a lot of booleans being thrown around and you could clear out a lot.
For the main flow, you have it in a bit of a mess. You should have only one flag, which denotes lines should be deleted or skipped. Your two flags are entirely confusing the process. Think about it this way, you need a flag to indicate that a part should be skipped. You don't need two values indicating the same thing. So how does that work?
Well obviously you start with looking for
Now, we also need to find
Now you can see that
Now, writing to the file is easy. You just want to write
Style Notes
Stick to 4 space indentation. It's the Python standard and far more readable for people. Speaking of standards, Python naming uses snake_case, so
Your names could be better too. You're not really deleting anything. You're 'skipping' or 'ignoring'. You don't really need to include
Speaking of files, you should use
The indentation signals how long to leave the file open, and then it's closed once you move out of that block. You should do the same for
Also in your final
You have a lot of booleans being thrown around and you could clear out a lot.
if nextFlag == True is the same as if nextFlag, and the latter looks neater so stick with it. For the main flow, you have it in a bit of a mess. You should have only one flag, which denotes lines should be deleted or skipped. Your two flags are entirely confusing the process. Think about it this way, you need a flag to indicate that a part should be skipped. You don't need two values indicating the same thing. So how does that work?
Well obviously you start with looking for
beginDelete. We need to first check if deleteFlag is True or False. If it's True we want it to remain as it is until stopDelete is found and skip. So if deleteFlag isn't True, then we set it to be the result of beginDelete in line. You can assign expressions to variables, even if it's a boolean expression. So on this basis we don't need another if check, we can just set deleteFlag to equal this result directly. Obviously this means the flag is now True if we've found beginDelete but False otherwise.for line in lines:
if not deleteFlag:
deleteFlag = beginDelete in lineNow, we also need to find
stopDelete to turn the flag False when it's found. This only needs to run if deleteFlag is True, so it should be an else condition:for line in lines:
if not deleteFlag:
deleteFlag = beginDelete in line
else:
deleteFlag = stopDelete not in line
continueNow you can see that
deleteFlag is being set based on whether or not stopDelete has been found yet. We're using not so that deleteFlag remains True until stopDelete is found. Once that happens, deleteFlag is once again False. However, since you don't want to write the line where you've just found stopDelete then you should use continue to skip to the next iteration of the line. continue is a keyword that tells a loop to immediately go to the next iteration, ignoring any remaining code in the loop's block.Now, writing to the file is easy. You just want to write
if not deleteFlag, ie. if the flag isn't True.if not deleteFlag:
output.write(line)Style Notes
Stick to 4 space indentation. It's the Python standard and far more readable for people. Speaking of standards, Python naming uses snake_case, so
beginDelete should be begin_delete etc. Your names could be better too. You're not really deleting anything. You're 'skipping' or 'ignoring'. You don't really need to include
flag in the name, using ignore would be pretty clear. Also avoid using input as that shadows the builtin Python method of the same name. Instead input_file or in_file. With such a brief usage, you can also use f.Speaking of files, you should use
with when opening files. It's a context manager that makes file manipulation safer. When using open you can easily run into trouble if you don't close the file. Sometimes that can happen due to errors, or if you forget to. Which it looks like you actually did with output. Using with means that a file is always automatically closed, even in the case of errors or exceptions, so it's a more reliable method. This is how you'd use it for lines.with open(path, "r") as f:
f.readlines()The indentation signals how long to leave the file open, and then it's closed once you move out of that block. You should do the same for
output, also open defaults to opening in "r" mode so you don't need to specify the argument there.Also in your final
os.walk loop you assign start and dirs but don't seem to use them. If you don't need those values, consider using the names _ and __. That's a Python way of saying that variables are unused throwaways, so people reading your code know they don't matter.Code Snippets
for line in lines:
if not deleteFlag:
deleteFlag = beginDelete in linefor line in lines:
if not deleteFlag:
deleteFlag = beginDelete in line
else:
deleteFlag = stopDelete not in line
continueif not deleteFlag:
output.write(line)with open(path, "r") as f:
f.readlines()Context
StackExchange Code Review Q#109770, answer score: 5
Revisions (0)
No revisions yet.