patternpythonMinor
Processing C++ comments
Viewed 0 times
processingcommentsstackoverflow
Problem
Here's the first functional version of my Python 2 script for processing comments in C++ source files. It's a personal project, I expect to expand it later with more advanced options (mainly about replacing comments with whitespace or marking their original positions in the comment-only output).
It's also intended as a learning excercise. I am self-learned in Python, my primary language is C++. So the core of my question is whether the code is "Pythonic" and if not, how to improve on that. I don't want to "write C++ with a different syntax," I want to (learn to) write proper Python.
I will of course also welcome any other comments (general style, efficiency, safety).
```
#! /usr/bin/env python
# Copyright Petr Kmoch 2014
"""Script for processing comments and non-comment code in C++ files.
The goal of this script to extract comments from C++ files and output only the comments, only the
non-comment code, or both.
For a quick usage summary, pass '-h' or '--help' as a command-line argument.
"""
import argparse
import os
import re
import sys
class Progress(object):
"""This class stores intermediary data when processing a single line of input.
It is used as a means of communicating data between the processFile() function and State_*
objects.
It contains the following members:
line: The tail part of the input line which has not been processed yet.
finished: Boolean flag indicating that the entire line has been processed.
stateChange:
If not None, this member holds a callable which takes the state stack as argument
and will modify it according to the results of processing the line.
noncomments: String of non-comments extracted from the line during processing.
comments: String of comments extracted from the line during processing.
"""
def __init__(self, line):
object.__init__(self)
self.line = line
self.resetProcessing()
def resetProcessing(self):
"""Clear the results of previously processing a piece
It's also intended as a learning excercise. I am self-learned in Python, my primary language is C++. So the core of my question is whether the code is "Pythonic" and if not, how to improve on that. I don't want to "write C++ with a different syntax," I want to (learn to) write proper Python.
I will of course also welcome any other comments (general style, efficiency, safety).
```
#! /usr/bin/env python
# Copyright Petr Kmoch 2014
"""Script for processing comments and non-comment code in C++ files.
The goal of this script to extract comments from C++ files and output only the comments, only the
non-comment code, or both.
For a quick usage summary, pass '-h' or '--help' as a command-line argument.
"""
import argparse
import os
import re
import sys
class Progress(object):
"""This class stores intermediary data when processing a single line of input.
It is used as a means of communicating data between the processFile() function and State_*
objects.
It contains the following members:
line: The tail part of the input line which has not been processed yet.
finished: Boolean flag indicating that the entire line has been processed.
stateChange:
If not None, this member holds a callable which takes the state stack as argument
and will modify it according to the results of processing the line.
noncomments: String of non-comments extracted from the line during processing.
comments: String of comments extracted from the line during processing.
"""
def __init__(self, line):
object.__init__(self)
self.line = line
self.resetProcessing()
def resetProcessing(self):
"""Clear the results of previously processing a piece
Solution
A few brief comments from an initial read through:
-
A lot of your method names are mixed case, but the Python convention is lowercase with underscores. The Python style guide is PEP 8:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.
PEP 8 also recommends 4 spaces for indentation, rather than the 2 space which you’ve used, but that’s not worth getting too worked up about.
-
Why do the docstrings for
-
Within the
Next, string slices that start or end in
So you don’t need to check for
-
A lot of your method names are mixed case, but the Python convention is lowercase with underscores. The Python style guide is PEP 8:
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.
PEP 8 also recommends 4 spaces for indentation, rather than the 2 space which you’ve used, but that’s not worth getting too worked up about.
-
Why do the docstrings for
extractNonComment and extractComment both say they append to noncomments, when this doesn’t seem to match what they’re actually doing?-
Within the
extract method: if in an initial bound in a string slice isn’t set, it defaults to 0, so you can replace val = self.line[0 : length] by val=self.line[:length].Next, string slices that start or end in
None return the whole string. For example:>>> my_string = "12345\n"
>>> my_string[:None]
"12345\n"So you don’t need to check for
length is None explicitly: just set val = self.line[:length]. Then you could just trim self.line by the length of val. Something like:def extract(self, length = None):
val = self.line[:length]
self.line = self.line[len(val):]
return valCode Snippets
>>> my_string = "12345\n"
>>> my_string[:None]
"12345\n"def extract(self, length = None):
val = self.line[:length]
self.line = self.line[len(val):]
return valContext
StackExchange Code Review Q#45484, answer score: 4
Revisions (0)
No revisions yet.