patterncppMinor
String similarity algorithm (c++)
Viewed 0 times
algorithmsimilaritystring
Problem
Parameters:
Function:
first - the first string.second - the second string.similarCharacters - outputs the number of similar characters.tolerance - the number of mistakes allowed to consider the strings "similar."Function:
bool Str_Similar(std::string first, std::string second, unsigned int* similarCharacters = nullptr, int tolerance = INT_MIN)
{
// Don't even check if either strings are empty.
if(first.empty() || second.empty()) return false;
// Determine if the first is greater than or equal to the second.
const bool firstGreaterOrEqualToSecond = first.length() >= second.length();
// By default, set the tolerance to half the length of the smaller string.
if(tolerance == INT_MIN) tolerance = (firstGreaterOrEqualToSecond ? (second.length() / 2) : (first.length() / 2));
if(tolerance < 0) tolerance = 0;
// Start off with any length difference, which are considered mistakes.
unsigned int mistakes = (unsigned int)abs(first.length() - second.length());
// Search only the length of the smaller string.
const size_t searchLength = (firstGreaterOrEqualToSecond ? second.length() : first.length());
// Do the search.
for(size_t i = 0, max = searchLength; i < max; i++)
{
if(first.at(i) != second.at(i)) mistakes++;
}
// Output the similar characters.
if(similarCharacters != nullptr) *similarCharacters = (unsigned int)abs(searchLength - mistakes);
// Compare the mistakes to the tolerance.
return (mistakes <= tolerance);
}Solution
- Prefer passing parameters by
constreference
The
std::string parameters should be passed by const reference rather than by value.Even if passing by value would work properly, it makes the function signature clearer for the caller semantically and may be more efficient.
- Fix all warnings
The line
return (mistakes <= tolerance);results in a compiler warning:
warning: comparison between signed and unsigned integer expressions [-Wsign-compare]- Prefer to use
numeric_limitsover the C-styleINT_MIN
For C++ code you should prefer to use
std::numeric_limits::min() instead of the INT_MIN macro (I couldn't even get that to compile, though stdint.h was included).- Always use
{}braces for conditional code sections
You always should use braces to enclose conditional code sections
if(similarCharacters != nullptr) {
*similarCharacters = (unsigned int)abs(searchLength - mistakes);
}Not only it improves the readability of the code, omitting the braces may make the code error prone for changes.
My compiling version can be found here.
Code Snippets
return (mistakes <= tolerance);warning: comparison between signed and unsigned integer expressions [-Wsign-compare]if(similarCharacters != nullptr) {
*similarCharacters = (unsigned int)abs(searchLength - mistakes);
}Context
StackExchange Code Review Q#153120, answer score: 5
Revisions (0)
No revisions yet.