patternpythonMinor
Lexer for C- in Python
Viewed 0 times
pythonforlexer
Problem
I am currently taking a compilers course where we are designing a compiler for C- (which is a subset of C). Our first step was the lexer and I have written that but I believe that it is not very "pythonic" and I was hoping someone could help me make it more "pythonic" as I believe doing this would make the future parts of this assignment far simpler and more manageable.
I will first discuss the rules of the language and then give my program.
-
The accepted keywords are as follows:
-
The special symbols are:
-
Other tokens are ID, NUM (for ints) or FLOAT defined by the following regular expressions:
Lowercase and uppercase are distinct.
-
Whitespace consists of blanks, newlines, and tabs. White space is ignore except that it must separate
The program will read in a C- file and output the line followed by every ID
INPUT: /**/ // / /
INPUT: ///**This***// */
INPUT: /****/
INPUT: /*
INPUT: i = 333; ***/ /
*
/
INPUT: iiii = 3@33;
ID: iiii
=
NUM: 3
Error: @3
I will first discuss the rules of the language and then give my program.
-
The accepted keywords are as follows:
else if int return void while float-
The special symbols are:
+ - * / >= == != = ; , ( ) [ ] { } /* */ //-
Other tokens are ID, NUM (for ints) or FLOAT defined by the following regular expressions:
FLOAT = (\d+(\.\d+)?([E][+|-]?\d+)?)
ID = letter letter*
NUM = digit digit*
letter = a|...|z|A|...|Z
digit = 0|...|9Lowercase and uppercase are distinct.
-
Whitespace consists of blanks, newlines, and tabs. White space is ignore except that it must separate
IDs, NUMs, FLOATs, and keywords. - Comments are surrounded by the C notations
/ ... / //and CAN (don't know why) be nested.
The program will read in a C- file and output the line followed by every ID
, keyword,NUM, andFLOAT in order that they appear as well as outputting every special symbol. (Comments are ignored and so is white space. Anything that is invalid is to be displayed as an error and the program resume as normal.) The program does not determine if the program is valid it is simply breaking it up.
Sample input:
/**/ /*/* */ */
/*/*/****This**********/*/ */
/**************/
/*************************
i = 333; ******************/ */
iiii = 3@33;
int g 4 cd (int u, int v) {
Sample output:
``INPUT: /**/ // / /
INPUT: ///**This***// */
INPUT: /****/
INPUT: /*
INPUT: i = 333; ***/ /
*
/
INPUT: iiii = 3@33;
ID: iiii
=
NUM: 3
Error: @3
Solution
Proper string formatting
As of Python 2.6.x, the string formatting operator
You can also specify positional, or named parameters, like the below, as well:
Never ever do something like this:
While doing something like this in a minuscule codebase probably won't affect much, doing this in general can result in some bad issues:
In general, you should never do something like this. In the case of this example, you should be catching a
Properly opening files
Just using
If you try to open a file using the above method, and your program unexpectedly quits before it's fully complete, resources used up by the file aren't freed.
In order to make sure that the resources are properly freed, you should be using a context manager to open the file, like this:
Once you're using the context manager, it's guaranteed that the resources taken up by the open file will be properly freed, even if the program unexpectedly exits.
Properly matching blank lines
In addition, you also have a bug, right here in the top-level
While in theory, this works if the user writes perfect code, and doesn't have extra spaces on a blank line, it could still fail if the user doesn't write perfect code, or accidentally includes extra spaces on a line. Here's an example of valid input, that wouldn't be properly matched.
A Good alternative might be to do something like this, although pattern-matching the line to make sure it doesn't contain illegal characters might be better:
Style/nitpicks
You don't have many style violations, there are a few things worth mentioning:
-
You should have spaces between commas in lists/dictionaries/tuples, like this:
Not like this:
As of Python 2.6.x, the string formatting operator
% has been deprecated, and the new string method, str.format, is now preferred instead. Here's an example of it's usage at the Python command line:>>> print "hello {}".format("world")
hello worldYou can also specify positional, or named parameters, like the below, as well:
>>> print "{1} {0}".format("world", "hello")
hello world
>>> print "{hello} {world}".format(hello="hello", world="world")
hello worldexcepting properlyNever ever do something like this:
try:
int(str)
return True
except:
return FalseWhile doing something like this in a minuscule codebase probably won't affect much, doing this in general can result in some bad issues:
- You caught an error that wasn't supposed to be caught, like a
SystemError,RuntimeError, or what-not.
- You're getting incorrect output because again, an error that wasn't supposed to be caught, was caught.
In general, you should never do something like this. In the case of this example, you should be catching a
ValueError, like this:try:
int(str)
return True
except ValueError:
return FalseProperly opening files
Just using
open, and assigning the return of it to a variable like this is not something you should get into the habit of doing:f = open( ... )If you try to open a file using the above method, and your program unexpectedly quits before it's fully complete, resources used up by the file aren't freed.
In order to make sure that the resources are properly freed, you should be using a context manager to open the file, like this:
with open( ... ) as f:
...Once you're using the context manager, it's guaranteed that the resources taken up by the open file will be properly freed, even if the program unexpectedly exits.
Properly matching blank lines
In addition, you also have a bug, right here in the top-level
for loop at the end of your code:for line in open(filename):
if line != "\n": # Bug here
...While in theory, this works if the user writes perfect code, and doesn't have extra spaces on a blank line, it could still fail if the user doesn't write perfect code, or accidentally includes extra spaces on a line. Here's an example of valid input, that wouldn't be properly matched.
ses are spaces and ns are beeline continuation characters:ssn
sn
ssssnA Good alternative might be to do something like this, although pattern-matching the line to make sure it doesn't contain illegal characters might be better:
for line in open(filename):
if line[-1] != "\n":
...Style/nitpicks
You don't have many style violations, there are a few things worth mentioning:
- There should be two blank lines between top-level code/functions/classes.
-
You should have spaces between commas in lists/dictionaries/tuples, like this:
spam = [1, 2, 3, 4, 5]Not like this:
spam = [1,2,3,4,5]Code Snippets
>>> print "hello {}".format("world")
hello world>>> print "{1} {0}".format("world", "hello")
hello world
>>> print "{hello} {world}".format(hello="hello", world="world")
hello worldtry:
int(str)
return True
except:
return Falsetry:
int(str)
return True
except ValueError:
return Falsef = open( ... )Context
StackExchange Code Review Q#105955, answer score: 6
Revisions (0)
No revisions yet.