HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonMinor

Lexer for C- in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
pythonforlexer

Problem

I am currently taking a compilers course where we are designing a compiler for C- (which is a subset of C). Our first step was the lexer and I have written that but I believe that it is not very "pythonic" and I was hoping someone could help me make it more "pythonic" as I believe doing this would make the future parts of this assignment far simpler and more manageable.

I will first discuss the rules of the language and then give my program.

-
The accepted keywords are as follows:

else if int return void while float


-
The special symbols are:

+ - * /  >= == != = ; , ( ) [ ] { } /* */ //


-
Other tokens are ID, NUM (for ints) or FLOAT defined by the following regular expressions:

FLOAT = (\d+(\.\d+)?([E][+|-]?\d+)?)
ID = letter letter*
NUM = digit digit*
letter = a|...|z|A|...|Z
digit = 0|...|9


Lowercase and uppercase are distinct.

-
Whitespace consists of blanks, newlines, and tabs. White space is ignore except that it must separate IDs, NUMs, FLOATs, and keywords.

  • Comments are surrounded by the C notations / ... / // and CAN (don't know why) be nested.



The program will read in a C- file and output the line followed by every ID, keyword,NUM, andFLOAT in order that they appear as well as outputting every special symbol. (Comments are ignored and so is white space. Anything that is invalid is to be displayed as an error and the program resume as normal.) The program does not determine if the program is valid it is simply breaking it up.

Sample input:

/**/          /*/* */   */
/*/*/****This**********/*/    */
/**************/
/*************************
i = 333;        ******************/       */

iiii = 3@33;

int g 4 cd (int u, int v)      {


Sample output:

``
INPUT: /**/ // / /
INPUT: ///**This***// */
INPUT: /****/
INPUT: /*
INPUT: i = 333; ***/ /
*
/
INPUT: iiii = 3@33;
ID: iiii
=
NUM: 3
Error: @3

Solution

Proper string formatting

As of Python 2.6.x, the string formatting operator % has been deprecated, and the new string method, str.format, is now preferred instead. Here's an example of it's usage at the Python command line:

>>> print "hello {}".format("world")
hello world


You can also specify positional, or named parameters, like the below, as well:

>>> print "{1} {0}".format("world", "hello")
hello world
>>> print "{hello} {world}".format(hello="hello", world="world")
hello world


excepting properly

Never ever do something like this:

try:
    int(str)
    return True
except:
    return False


While doing something like this in a minuscule codebase probably won't affect much, doing this in general can result in some bad issues:

  • You caught an error that wasn't supposed to be caught, like a SystemError, RuntimeError, or what-not.



  • You're getting incorrect output because again, an error that wasn't supposed to be caught, was caught.



In general, you should never do something like this. In the case of this example, you should be catching a ValueError, like this:

try:
    int(str)
    return True
except ValueError:
    return False


Properly opening files

Just using open, and assigning the return of it to a variable like this is not something you should get into the habit of doing:

f = open( ... )


If you try to open a file using the above method, and your program unexpectedly quits before it's fully complete, resources used up by the file aren't freed.

In order to make sure that the resources are properly freed, you should be using a context manager to open the file, like this:

with open( ... ) as f:
    ...


Once you're using the context manager, it's guaranteed that the resources taken up by the open file will be properly freed, even if the program unexpectedly exits.

Properly matching blank lines

In addition, you also have a bug, right here in the top-level for loop at the end of your code:

for line in open(filename):
    if line != "\n": # Bug here
        ...


While in theory, this works if the user writes perfect code, and doesn't have extra spaces on a blank line, it could still fail if the user doesn't write perfect code, or accidentally includes extra spaces on a line. Here's an example of valid input, that wouldn't be properly matched. ses are spaces and ns are beeline continuation characters:

ssn
sn
ssssn


A Good alternative might be to do something like this, although pattern-matching the line to make sure it doesn't contain illegal characters might be better:

for line in open(filename):
    if line[-1] != "\n":
        ...


Style/nitpicks

You don't have many style violations, there are a few things worth mentioning:

  • There should be two blank lines between top-level code/functions/classes.



-
You should have spaces between commas in lists/dictionaries/tuples, like this:

spam = [1, 2, 3, 4, 5]


Not like this:

spam = [1,2,3,4,5]

Code Snippets

>>> print "hello {}".format("world")
hello world
>>> print "{1} {0}".format("world", "hello")
hello world
>>> print "{hello} {world}".format(hello="hello", world="world")
hello world
try:
    int(str)
    return True
except:
    return False
try:
    int(str)
    return True
except ValueError:
    return False
f = open( ... )

Context

StackExchange Code Review Q#105955, answer score: 6

Revisions (0)

No revisions yet.