HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Python script to delete sections of text

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
scriptdeletetextpythonsections

Problem

I've been writing a python script to help me clear up some files that contain outdated code, while also serving as a way to learn python. These files have sections of code surrounded with tags on their own lines, I need it to remove both the tags and everything within. While this largely applies to XML documents I am also using this script for several other file types, so XML specific solutions aren't suitable.

I have a working script already but it doesn't look like the most elegant or efficient solution by far, so I would like to see how I can improve it.

Script is called as python cleanup.py startDeleteTag stopDeleteTag where the last two arguments are the locations where the code should be deleted and stop being deleted respectfully.

import os, sys

def clearFile(path, beginDelete, stopDelete):

  path = os.path.join(path, "fileName.xml")
  input = open(path, "r")
  lines = input.readlines()
  input.close()

  currentFlag = False
  nextFlag = False

  output = open(path, "w")

  for line in lines:
    if nextFlag == True:
      nextFlag = False
      deleteFlag = False

    if beginDelete in line:
      deleteFlag = True
    elif stopDelete in line:
      nextFlag = True

    if deleteFlag == False:
      output.write(line)

thisDirectory = os.getcwd()
for start, dirs, files in os.walk(thisDirectory):
  for f in files:
    if f == "fileName.xml":
      clearFile(thisDirectory, sys.argv[1], sys.argv[2])


This is the part of the script that checks each line for the start of the section to be deleted (beginDelete to stopDelete). Because this would leave the end tag in the file I have to use a couple of booleans to specify when the current line needs to be removed but the next line shouldn't be. As an extra, is there a way to improve this to further check for tags that may be on the same line as body of the code? This isn't currently an issue but I'm curious how I would do it if I needed to in future.

As requested, and example file input would loo

Solution

Program flow

You have a lot of booleans being thrown around and you could clear out a lot. if nextFlag == True is the same as if nextFlag, and the latter looks neater so stick with it.

For the main flow, you have it in a bit of a mess. You should have only one flag, which denotes lines should be deleted or skipped. Your two flags are entirely confusing the process. Think about it this way, you need a flag to indicate that a part should be skipped. You don't need two values indicating the same thing. So how does that work?

Well obviously you start with looking for beginDelete. We need to first check if deleteFlag is True or False. If it's True we want it to remain as it is until stopDelete is found and skip. So if deleteFlag isn't True, then we set it to be the result of beginDelete in line. You can assign expressions to variables, even if it's a boolean expression. So on this basis we don't need another if check, we can just set deleteFlag to equal this result directly. Obviously this means the flag is now True if we've found beginDelete but False otherwise.

for line in lines:
    if not deleteFlag:
        deleteFlag = beginDelete in line


Now, we also need to find stopDelete to turn the flag False when it's found. This only needs to run if deleteFlag is True, so it should be an else condition:

for line in lines:
    if not deleteFlag:
        deleteFlag = beginDelete in line
    else:
        deleteFlag = stopDelete not in line
        continue


Now you can see that deleteFlag is being set based on whether or not stopDelete has been found yet. We're using not so that deleteFlag remains True until stopDelete is found. Once that happens, deleteFlag is once again False. However, since you don't want to write the line where you've just found stopDelete then you should use continue to skip to the next iteration of the line. continue is a keyword that tells a loop to immediately go to the next iteration, ignoring any remaining code in the loop's block.

Now, writing to the file is easy. You just want to write if not deleteFlag, ie. if the flag isn't True.

if not deleteFlag:
        output.write(line)


Style Notes

Stick to 4 space indentation. It's the Python standard and far more readable for people. Speaking of standards, Python naming uses snake_case, so beginDelete should be begin_delete etc.

Your names could be better too. You're not really deleting anything. You're 'skipping' or 'ignoring'. You don't really need to include flag in the name, using ignore would be pretty clear. Also avoid using input as that shadows the builtin Python method of the same name. Instead input_file or in_file. With such a brief usage, you can also use f.

Speaking of files, you should use with when opening files. It's a context manager that makes file manipulation safer. When using open you can easily run into trouble if you don't close the file. Sometimes that can happen due to errors, or if you forget to. Which it looks like you actually did with output. Using with means that a file is always automatically closed, even in the case of errors or exceptions, so it's a more reliable method. This is how you'd use it for lines.

with open(path, "r") as f:
    f.readlines()


The indentation signals how long to leave the file open, and then it's closed once you move out of that block. You should do the same for output, also open defaults to opening in "r" mode so you don't need to specify the argument there.

Also in your final os.walk loop you assign start and dirs but don't seem to use them. If you don't need those values, consider using the names _ and __. That's a Python way of saying that variables are unused throwaways, so people reading your code know they don't matter.

Code Snippets

for line in lines:
    if not deleteFlag:
        deleteFlag = beginDelete in line
for line in lines:
    if not deleteFlag:
        deleteFlag = beginDelete in line
    else:
        deleteFlag = stopDelete not in line
        continue
if not deleteFlag:
        output.write(line)
with open(path, "r") as f:
    f.readlines()

Context

StackExchange Code Review Q#109770, answer score: 5

Revisions (0)

No revisions yet.