HiveBrain v1.2.0
Get Started
← Back to all entries
patterncMinor

Improving and optimizing squeeze() function

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
squeezefunctionoptimizingandimproving

Problem

I am currently reading The C Programming Language by Dennis Richie and Brian Kernighan, and I came to implement the function squeeze (s1, s2) as an exercise. squeeze() deletes each character in s1 that matches any character in the string s2. While coding squeeze(), I thought that I can extend its capability to act like trim() method in Java, and then the my code was already done, and tested.

However, even though it is implemented in C, I believe that the code below can be improved and optimized. Please help me do so.

char *squeeze (char *str1, char *str2) {

     int i, j;
     int str1Len = strlen (str1);
     int toBeSubtractLen = 0; 

     char chr1 = '\0';
     char chr2 = '\0';

     for (i = 0; i < str1Len; i++) {
          chr1 = str1 [i];
          for (j = 0; j < strlen (str2); j++) {
               chr2 = str2 [j];
               if (chr1 == chr2) {
                    toBeSubtractLen++;     
               }               
          }
     }

     char *finalStr;
     finalStr = malloc ((str1Len - toBeSubtractLen) + 1);
     if (finalStr == NULL) {
          printf ("Unable to allocate memory.\n");
          exit (EXIT_FAILURE);
     }

     int indx = 0;
     for (i = 0; i < str1Len; i++) {
          chr1 = str1 [i];
          for (j = 0; j < strlen (str2); j++) {
               chr2 = str2 [j];
               if (chr1 == chr2) {
                    break;
               }
          }
          if (chr1 != chr2) {
               finalStr[indx] = chr1;
               indx++;
          }
     }
     return finalStr;

} /* end of squeeze() */


Samples:

  • str1 = ",AaBbCcDdEeFfGgHhIiJjKkLlMmAaAaAaNnOoPpQqRrSsTtUuVvWwXxYyZz"



  • str2 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"



  • str3 = "abcdefghijklmnopqrstuvwxyz"



  • name = "ChristopherM.Sawi"



  • birthday = "August14,1988"



Outputs:

`1. squeeze-ing str1 by str2 yields: ",abcdefghijklmaaanopqrstuvwxyz"
  1. squeeze-ing str1 by str3 yields: "mABCDEFGHIJKLMAAANOPQRSTUVWXYZ"
  2. squeeze-ing str2 by str3 y

Solution

You want to avoid repeat iteration over the 2nd string, which is O(N). Do this N times and you have an O(N2) algorithm.

One trick you could use is to build a lookup table of the chars in str2. This would cost O(N) and some extra memory upfront, but then subsequent lookups in the table could be brought down to O(1).

Here's a simple example. The trade-off is this takes a bit more memory and makes some assumptions about the range of char. It also clobbers the buffer of str1:

#include    // for UCHAR_MAX

char *squeeze (char *str1, char *str2)
{
   char present[UCHAR_MAX + 1] = {0};
   char *src, *dst;

   // Build lookup table of chars in str2
   //
   while (*str2)
      present[(unsigned char)*str2++] = 1;

   // Iterate through str1, removing chars along the way.
   //
   src = dst = str1;
   while (*src)
   {
      // Is *src in str2?
      //
      if (present[(unsigned char)*src])
      {
         // Yes, remove this char.. (Advance src but not dst.)
         //
         ++src;
      }
      else
         *dst++ = *src++;
   }

   // NUL terminator
   //
   *dst = 0;

   return str1;
}

Code Snippets

#include <limits.h>   // for UCHAR_MAX

char *squeeze (char *str1, char *str2)
{
   char present[UCHAR_MAX + 1] = {0};
   char *src, *dst;

   // Build lookup table of chars in str2
   //
   while (*str2)
      present[(unsigned char)*str2++] = 1;

   // Iterate through str1, removing chars along the way.
   //
   src = dst = str1;
   while (*src)
   {
      // Is *src in str2?
      //
      if (present[(unsigned char)*src])
      {
         // Yes, remove this char.. (Advance src but not dst.)
         //
         ++src;
      }
      else
         *dst++ = *src++;
   }

   // NUL terminator
   //
   *dst = 0;

   return str1;
}

Context

StackExchange Code Review Q#16301, answer score: 6

Revisions (0)

No revisions yet.