HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Lua comment remover

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
commentremoverlua

Problem

I had a job to remove all kinds of comments from the Lua file. I tried to find a usable Python script to this on the net, but Google did not help.

Therefore, I made one. This script recognizes all types of comments such as single and multi-Line comments.

I would welcome your opinion.

# written in Python 3.2

import codecs
import re

inputFilePath = 'testfile.lua'
inputLuaFile = codecs.open( inputFilePath, 'r', encoding = 'utf-8-sig' )
inputLuaFileDataList = inputLuaFile.read().split( "\r\n" )
inputLuaFile.close()

outputFilePath = 'testfile_out.lua'
outputLuaFile = codecs.open( outputFilePath, 'w', encoding = 'utf-8' )
outputLuaFile.write( codecs.BOM_UTF8.decode( "utf-8" ) )

def create_compile( patterns ):
    compStr = '|'.join( '(?P%s)' % pair for pair in patterns )
    regexp = re.compile( compStr )

    return regexp

comRegexpPatt = [( "oneLineS", r"--[^\[\]]*?$" ),
                 ( "oneLine", r"--(?!(-|\[|\]))[^\[\]]*?$" ),
                 ( "oneLineBlock", r"(?<!-)(--\[\[.*?\]\])" ),
                 ( "blockS", r"(?<!-)--(?=(\[\[)).*?$" ),
                 ( "blockE", r".*?\]\]" ),
                 ( "offBlockS", r"---+\[\[.*?$" ),
                 ( "offBlockE", r".*?--\]\]" ),
                 ]

comRegexp = create_compile( comRegexpPatt )

comBlockState = False

for i in inputLuaFileDataList:
    res = comRegexp.search( i )
    if res:
        typ = res.lastgroup
        if comBlockState:
            if typ == "blockE":
                comBlockState = False
                i = res.re.sub( "", i )
            else:
                i = ""
        else:
            if typ == "blockS":
                comBlockState = True
                i = res.re.sub( "", i )
            else:
                comBlockState = False
                i = res.re.sub( "", i )
    elif comBlockState:
        i = ""

    if not i == "":
        outputLuaFile.write( "{}\n".format( i ) )

Solution

Firstly, I think you have some subtle bugs.

  • What if -- appears inside a string?


In that case it should not be a
comment.

  • What if someone ends a block and starts another one on the same line: --]] --[[



  • You split on '\r\n', if you run this on a linux system you won't have those line seperators



Secondly, some your variable names could use help.

  • The python style guide recommonds


underscore_style not camelCase for
local variable names.

  • You use some abbreviations in your names, I don't think that's a good idea. e.g. res or comRegexPatt



  • You have an i, the name of which gives very little hint what it is doing



Your regular expressions look convoluted. I think this a symptom of the fact that the problem is not best solved by a regular expression. This will be even more so if you fix the string problem.

The way I'd solve this problem: I'd write a class CodeText which holds the actual code in question and then write code like this:

def handle_code(code_text):
     while code_text.text_remaining:
          if code_text.matches('--'):
               handle_comment(code_text)
          elif code_text.matches('"'):
               handle_string(code_text)
          else:
               code_text.accept()

def handle_string(code_text):
    while not code_text.matches('"'):
         code_text.accept()

def handle_comment(code_text):
    if code_text.matches('[['):
         handle_block_comment(code_text)
    else:
         while not code_text.matches('\n'):
             code_text.reject()

 def handle_comment_block(code_text):
     while not code_text.match("--]]"):
         code_text.reject()

Code Snippets

def handle_code(code_text):
     while code_text.text_remaining:
          if code_text.matches('--'):
               handle_comment(code_text)
          elif code_text.matches('"'):
               handle_string(code_text)
          else:
               code_text.accept()

def handle_string(code_text):
    while not code_text.matches('"'):
         code_text.accept()

def handle_comment(code_text):
    if code_text.matches('[['):
         handle_block_comment(code_text)
    else:
         while not code_text.matches('\n'):
             code_text.reject()

 def handle_comment_block(code_text):
     while not code_text.match("--]]"):
         code_text.reject()

Context

StackExchange Code Review Q#1601, answer score: 5

Revisions (0)

No revisions yet.