patternpythonMinor
Lua comment remover
Viewed 0 times
commentremoverlua
Problem
I had a job to remove all kinds of comments from the Lua file. I tried to find a usable Python script to this on the net, but Google did not help.
Therefore, I made one. This script recognizes all types of comments such as single and multi-Line comments.
I would welcome your opinion.
Therefore, I made one. This script recognizes all types of comments such as single and multi-Line comments.
I would welcome your opinion.
# written in Python 3.2
import codecs
import re
inputFilePath = 'testfile.lua'
inputLuaFile = codecs.open( inputFilePath, 'r', encoding = 'utf-8-sig' )
inputLuaFileDataList = inputLuaFile.read().split( "\r\n" )
inputLuaFile.close()
outputFilePath = 'testfile_out.lua'
outputLuaFile = codecs.open( outputFilePath, 'w', encoding = 'utf-8' )
outputLuaFile.write( codecs.BOM_UTF8.decode( "utf-8" ) )
def create_compile( patterns ):
compStr = '|'.join( '(?P%s)' % pair for pair in patterns )
regexp = re.compile( compStr )
return regexp
comRegexpPatt = [( "oneLineS", r"--[^\[\]]*?$" ),
( "oneLine", r"--(?!(-|\[|\]))[^\[\]]*?$" ),
( "oneLineBlock", r"(?<!-)(--\[\[.*?\]\])" ),
( "blockS", r"(?<!-)--(?=(\[\[)).*?$" ),
( "blockE", r".*?\]\]" ),
( "offBlockS", r"---+\[\[.*?$" ),
( "offBlockE", r".*?--\]\]" ),
]
comRegexp = create_compile( comRegexpPatt )
comBlockState = False
for i in inputLuaFileDataList:
res = comRegexp.search( i )
if res:
typ = res.lastgroup
if comBlockState:
if typ == "blockE":
comBlockState = False
i = res.re.sub( "", i )
else:
i = ""
else:
if typ == "blockS":
comBlockState = True
i = res.re.sub( "", i )
else:
comBlockState = False
i = res.re.sub( "", i )
elif comBlockState:
i = ""
if not i == "":
outputLuaFile.write( "{}\n".format( i ) )Solution
Firstly, I think you have some subtle bugs.
In that case it should not be a
comment.
Secondly, some your variable names could use help.
underscore_style not camelCase for
local variable names.
Your regular expressions look convoluted. I think this a symptom of the fact that the problem is not best solved by a regular expression. This will be even more so if you fix the string problem.
The way I'd solve this problem: I'd write a class CodeText which holds the actual code in question and then write code like this:
- What if -- appears inside a string?
In that case it should not be a
comment.
- What if someone ends a block and starts another one on the same line: --]] --[[
- You split on '\r\n', if you run this on a linux system you won't have those line seperators
Secondly, some your variable names could use help.
- The python style guide recommonds
underscore_style not camelCase for
local variable names.
- You use some abbreviations in your names, I don't think that's a good idea. e.g. res or comRegexPatt
- You have an i, the name of which gives very little hint what it is doing
Your regular expressions look convoluted. I think this a symptom of the fact that the problem is not best solved by a regular expression. This will be even more so if you fix the string problem.
The way I'd solve this problem: I'd write a class CodeText which holds the actual code in question and then write code like this:
def handle_code(code_text):
while code_text.text_remaining:
if code_text.matches('--'):
handle_comment(code_text)
elif code_text.matches('"'):
handle_string(code_text)
else:
code_text.accept()
def handle_string(code_text):
while not code_text.matches('"'):
code_text.accept()
def handle_comment(code_text):
if code_text.matches('[['):
handle_block_comment(code_text)
else:
while not code_text.matches('\n'):
code_text.reject()
def handle_comment_block(code_text):
while not code_text.match("--]]"):
code_text.reject()Code Snippets
def handle_code(code_text):
while code_text.text_remaining:
if code_text.matches('--'):
handle_comment(code_text)
elif code_text.matches('"'):
handle_string(code_text)
else:
code_text.accept()
def handle_string(code_text):
while not code_text.matches('"'):
code_text.accept()
def handle_comment(code_text):
if code_text.matches('[['):
handle_block_comment(code_text)
else:
while not code_text.matches('\n'):
code_text.reject()
def handle_comment_block(code_text):
while not code_text.match("--]]"):
code_text.reject()Context
StackExchange Code Review Q#1601, answer score: 5
Revisions (0)
No revisions yet.