HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Token-finding function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
tokenfunctionfinding

Problem

I want to make sure find_value_from_key() is safe and efficient. My constraints are language C++, use of standard library ok, but cannot use Boost or any new C++11 features.

Can anyone offer any comments/criticisms/feedback on my implementation of the function here? Any flaws in my test cases would also be interesting to hear.

#include 
#include 

using std::cout;
using std::endl;
using std::string;

bool find_value_from_key(const string& source, const string& key, string& value, const string& EndOfValueMarker) {
   bool ret = false;
   size_t pos_start = source.find(key);
   if(pos_start != string::npos) {
      size_t pos_end = source.find_first_of(EndOfValueMarker, pos_start+key.length());
      if(pos_end == string::npos && source.length() > pos_start+key.length())
         pos_end = source.length();

      if(pos_end != string::npos) {
         value = source.substr(pos_start+key.length(), pos_end-pos_start-key.length());
         ret = true;
      }
   }
   return ret;
}

int main(int argc, char* argv[])
{
   //test strings to exercise find_value_from_key function 
   const char* test_strings[] = {
      "key1 = value1", 
      "   key1 = value1   ", 
      "key1 = value1,key2 = value2 ,key3 = value3",
      "_key1 = value1 ,key2 = value2 ,Gkey3 = value3,", 
      "key1 = value1,key2 =  value2 ,key3 = value3 x", 
      "key1 = value1,key2 = value2key3 = value3\n", 
      "key1 = value1_,key2 = value2 ,key3 = value3                    ", 
      "key1 = value1$,key2 = value2 ,key3 =   value3", 
      "key1 = value1\t,key2 = value2  ,key3 = value3\t" };

   const char* keys[] = {"key1 = ", "key2 = ", "key3 = "};

   int elements = sizeof(test_strings) / sizeof(test_strings[0]);

   for(int i = 0; i < elements; ++i) {
     cout << "processing: " << test_strings[i] << endl;
     string value;
     if(find_value_from_key(test_strings[0], keys[0], value, ", \t\n"))
        cout << value << endl;
   }

   return 0;
}

Solution

find_value_from_key

The core method find_value_from_key looks functional for the limited specification you have given. I am not a great fan of the specification though, I think the function should try to be a bit smarter about identifying values in the input strings. The two items I find most concerning are:

  • It will successfully find key1 = and value1 in "ThisIsSubkey1 = value1"



  • it does not successfully find value1 in "key1 = value1" (note two spaces after =)



These are both items related to the context-naive usage of the find(...) method.

Still, as far as the specification goes, your code looks like it does the job.

There is one improvement that can/should be done to make it more readable. You reuse the value pos_start+key.length() four times. Performance-wise this may be sorted out by a compiler, but it does not help the readability at all...

I would prefer the function written as:

bool find_value_from_key(const string& source, const string& key, string& value, const string& endmarkers) {
   bool ret = false;
   size_t key_start = source.find(key);
   size_t val_start = key_start + key.length();
   if(key_start != string::npos) {
      size_t val_end = source.find_first_of(endmarkers, val_start);
      if(val_end == string::npos && source.length() > val_start)
         val_end = source.length();

      if(val_end != string::npos) {
         value = source.substr(val_start, val_end - val_start);
         ret = true;
      }
   }
   return ret;
}


Note how there is no pos_start variable... it was being reused for different purposes in different places. It was confusing. Renaming the variables to self-document makes the code easier to understand. The compiler will sort out any optimizations in the same ways as before.

There is one additional item in that method... why the CamelCase parameter EndOfValueMarker? I have renamed it endmarkers.
main

The main method has a couple of issues I think must just be an artifact from testing.

You set up the loop to check each input string, but, you do not use i (the loop variable) as an index to the actual data you check... you have the code:

for(int i = 0; i < elements; ++i) {
     cout << "processing: " << test_strings[i] << endl;
     string value;
     if(find_value_from_key(test_strings[0], keys[0], value, ", \t\n"))
        cout << value << endl;
   }


This code prints each test_strings[i] just fine, but the input to the function is always index 0... so you only actually test the first input.

The second item is that you only test for key1. The other keys are not tested.

The loop code should look like:

for(int i = 0; i  " << value << endl;
    }
}


I know, it has the magic number 3 for the k loop, but at least it checks each key....
Ideone

I have put my recommendations in to this ideone... take it for a spin

Code Snippets

bool find_value_from_key(const string& source, const string& key, string& value, const string& endmarkers) {
   bool ret = false;
   size_t key_start = source.find(key);
   size_t val_start = key_start + key.length();
   if(key_start != string::npos) {
      size_t val_end = source.find_first_of(endmarkers, val_start);
      if(val_end == string::npos && source.length() > val_start)
         val_end = source.length();

      if(val_end != string::npos) {
         value = source.substr(val_start, val_end - val_start);
         ret = true;
      }
   }
   return ret;
}
for(int i = 0; i < elements; ++i) {
     cout << "processing: " << test_strings[i] << endl;
     string value;
     if(find_value_from_key(test_strings[0], keys[0], value, ", \t\n"))
        cout << value << endl;
   }
for(int i = 0; i < elements; ++i) {
    cout << "processing: " << test_strings[i] << endl;
    string value;
    for (int k = 0; k < 3; k++) {
        if(find_value_from_key(test_strings[i], keys[k], value, ", \t\n"))
            cout << keys[k] << " -> " << value << endl;
    }
}

Context

StackExchange Code Review Q#18079, answer score: 5

Revisions (0)

No revisions yet.