patterncppMinor
Pattern tokenization program
Viewed 0 times
tokenizationprogrampattern
Problem
I am writing this program to try and get some practice at C++ and correct and proper styling.
This code is meant to take user input from a pip and then run it through the program. The user enters a pattern and that pattern is turned into regex which is compared against the input, such that it could look like this:
I was wondering if I could get some pointers as to what I am doing wrong here (what is not best practice and anything else you may see as being incorrect).
```
#include
#include
#include
#include
#include "build/pcrecpp.h"
void ReadFile(std::vector &lines_in_file);
bool ValidateArguments(int arguments);
std::string RegexConverter(std::string &user_specified_pattern);
void SplitInputLine(std::string &user_specified_pattern, std::vector &users_pattern_split);
void TokenValidator(std::vector& users_pattern_split, std::string& users_converted_pattern);
std::string TokenToRegex(std::string word_in_user_specified_pattern);
bool FindMatchingLines(std::vector lines_in_file, std::string user_specified_pattern);
int main(int argc, char *argv[]) {
if (!ValidateArguments(argc)) return 1;
std::string user_specified_pattern = argv[1];
std::vector lines_in_file; // 2 diminsional vector containing [lines][words]
ReadFile(lines_in_file);
std::string users_converted_pattern = RegexConverter(user_specified_pattern);
FindMatchingLines(lines_in_file, users_converted_pattern);
return 0;
}
void ReadFile(std::vector &lines_in_file) {
std::stringstream input_file;
input_file users_pattern_split;
std::string users_converted_pattern;
SplitInputLine(user_specified_pattern, users_pattern_split);
TokenValidator(users_pattern_split, users_converted_pattern);
return users_converted_pattern;
}
void SplitInputLine(std::string &user_specified_pattern, std::vector &users_pattern_split) {
char sep = ' ';
for (size_t p = 0, q = 0; p != user_specified_pattern.npos; p = q)
users_
This code is meant to take user input from a pip and then run it through the program. The user enters a pattern and that pattern is turned into regex which is compared against the input, such that it could look like this:
cat input.txt |./app "pattern %{0} end of pattern"I was wondering if I could get some pointers as to what I am doing wrong here (what is not best practice and anything else you may see as being incorrect).
```
#include
#include
#include
#include
#include "build/pcrecpp.h"
void ReadFile(std::vector &lines_in_file);
bool ValidateArguments(int arguments);
std::string RegexConverter(std::string &user_specified_pattern);
void SplitInputLine(std::string &user_specified_pattern, std::vector &users_pattern_split);
void TokenValidator(std::vector& users_pattern_split, std::string& users_converted_pattern);
std::string TokenToRegex(std::string word_in_user_specified_pattern);
bool FindMatchingLines(std::vector lines_in_file, std::string user_specified_pattern);
int main(int argc, char *argv[]) {
if (!ValidateArguments(argc)) return 1;
std::string user_specified_pattern = argv[1];
std::vector lines_in_file; // 2 diminsional vector containing [lines][words]
ReadFile(lines_in_file);
std::string users_converted_pattern = RegexConverter(user_specified_pattern);
FindMatchingLines(lines_in_file, users_converted_pattern);
return 0;
}
void ReadFile(std::vector &lines_in_file) {
std::stringstream input_file;
input_file users_pattern_split;
std::string users_converted_pattern;
SplitInputLine(user_specified_pattern, users_pattern_split);
TokenValidator(users_pattern_split, users_converted_pattern);
return users_converted_pattern;
}
void SplitInputLine(std::string &user_specified_pattern, std::vector &users_pattern_split) {
char sep = ' ';
for (size_t p = 0, q = 0; p != user_specified_pattern.npos; p = q)
users_
Solution
The big thing that jumps out from reading your code is the lack of
Should be:
Likewise for:
which should be:
(Also, watch out for using
Speaking of
This desperately needs to be split up into a few intermediate calls to actually show what is going on.This is "golfed" code - code written in the shortest possible way, readability be damned.
If you're using C++11, you can simplify a the "index" based looping and instead just use a range-based for:
becomes
Be careful of passing parameters by value; this is introducing unnecessary overhead.
Here, full copies of
can become:
Be careful on going overboard on variable names. You've generally done a good job; everything is named with descriptive names, but things like
As a final note, you might want to swap to the standard regex implementation (in ``) if it is available (again, this is a C++11 feature).
const. Just about every reference parameter that is passed should be passed by const &. This is very valuable - it means the compiler can enforce that only methods that will not change the object state will be called. Further, don't pass parameters by reference and write to them, returning void. Some people do this because they think it is more efficient (it almost always isn't), and it's not good practice. For example:void SplitInputLine(std::string &user_specified_pattern, std::vector &users_pattern_split)Should be:
std::vector SplitInputLine(const std::string& user_specified_pattern)Likewise for:
void TokenValidator(std::vector &users_pattern_split, std::string &users_converted_pattern)which should be:
std::string TokenValidator(const std::vector& users_pattern_split)(Also, watch out for using
string instead of std::string. If this is compiling for you, it suggests that a using namespace std is lurking around somewhere).Speaking of
SplitInputLine, how this function works is entirely non-obvious.for (size_t p = 0, q = 0; p != user_specified_pattern.npos; p = q)
users_pattern_split.push_back(user_specified_pattern.substr(p+(p != 0),
(q = user_specified_pattern.find(sep, p+1))-p-(p != 0)));This desperately needs to be split up into a few intermediate calls to actually show what is going on.This is "golfed" code - code written in the shortest possible way, readability be damned.
If you're using C++11, you can simplify a the "index" based looping and instead just use a range-based for:
for (int i = 0; i < users_pattern_split.size(); i++)becomes
for(char c : users_pattern_split)Be careful of passing parameters by value; this is introducing unnecessary overhead.
bool FindMatchingLines(std::vector lines_in_file, std::string user_specified_pattern)Here, full copies of
lines_in_file and user_specified_patterns are created and passed into the function. Both should be passed by const & instead. Inside this function, using a ranged-based for will again make the code much easier to read:for (int i = 0; i < lines_in_file.size(); ++i)can become:
for(const auto& line : lines_in_file)Be careful on going overboard on variable names. You've generally done a good job; everything is named with descriptive names, but things like
is_there_a_match are overkill; match (or even match_exists) would both be fine.As a final note, you might want to swap to the standard regex implementation (in ``) if it is available (again, this is a C++11 feature).
Code Snippets
void SplitInputLine(std::string &user_specified_pattern, std::vector<string> &users_pattern_split)std::vector<std::string> SplitInputLine(const std::string& user_specified_pattern)void TokenValidator(std::vector<string> &users_pattern_split, std::string &users_converted_pattern)std::string TokenValidator(const std::vector<std::string>& users_pattern_split)for (size_t p = 0, q = 0; p != user_specified_pattern.npos; p = q)
users_pattern_split.push_back(user_specified_pattern.substr(p+(p != 0),
(q = user_specified_pattern.find(sep, p+1))-p-(p != 0)));Context
StackExchange Code Review Q#57497, answer score: 4
Revisions (0)
No revisions yet.