patterncMinor
C Split function review
Viewed 0 times
reviewsplitfunction
Problem
I started learning
Is there anything I could do better (memory, performance and error handling)? Maybe in the logic of the function?
Also, feel free to criticize my writing style, as I want it to be more readable and "correct".
C a week ago, and here is my implementation for the split function:char** my_split(const char* str, char delim, int* size)
{
int index = 0, start;
char** results = NULL;
*size = 0;
do
{
if ((*size) % 10 == 0) //top floor reached
{
results = realloc(results, ((*size) + 10) * sizeof(char*)); //extend the space for another 10 pointers
if (results == NULL) //if failed, return NULL
return NULL;
}
start = index;
while (str[index] != delim && str[index]) //find the delimeter or end of string
{
index++;
}
int length = index - start; //length of matching string
results[*size] = malloc((length + 1) * sizeof(char)); //allocate memory in the size of the string, and another char for '\0'
if (results[*size] == NULL) //if failed, return NULL
return NULL;
strncpy(results[*size], &str[start], length); //copy the matching string to the array
results[*size][length] = 0; //end the string
(*size)++; //add 1 to the total size
} while (str[index++]);
results = realloc(results, (*size) * sizeof(char*)); //trim unused bytes
return results;
}Is there anything I could do better (memory, performance and error handling)? Maybe in the logic of the function?
Also, feel free to criticize my writing style, as I want it to be more readable and "correct".
Solution
Note: I would take everything I say with a grain of salt. It's been a few years since I used C on a regular basis, and even at my peak with C, I was never very familiar with the standard or certain best practices.
Your function is responsible for cleaning up memory, yet when you bail in your NULL returns, you don't do that.
In particular, if a realloc fails, the memory already allocated stays, and if allocating the second dimension-ed array fails, you still need to free what you have already allocated before returning.
A review of a few code points:
I'm not familiar enough with standard practices of C to know what the expected way would be, but something about the the memory handling feels odd to me:
I might consider abstracting the memory away to a generalized vector or a specialized string vector.
That could bring your API to something like:
The downside to this would be that you'd be tying your other code to a specific API (though bastardizing the vector into a
Your function is responsible for cleaning up memory, yet when you bail in your NULL returns, you don't do that.
In particular, if a realloc fails, the memory already allocated stays, and if allocating the second dimension-ed array fails, you still need to free what you have already allocated before returning.
A review of a few code points:
- I would highly consider abstracting away the memory stuff, but if you don't, I would consider doubling the array size instead of adding ten.
- Consider a fairly long string being split on space. Assume it ends up being 1000 split strings. This means 100 allocations, and allocating is fairly slow. If you were to double, it would be 10 allocations.
- Doubling wastes, at most,
sizeof(type) * n/2bytes, but has at most,lg nallocations (note: very rough math)
- Doubling using more memory but is typically faster
sizeshould be an unsigned type (I would use size_t)
- As previously mentioned, you have potential memory leaks
- As your code exists, there's really no advantage over strtok
- strtok doesn't very well allow storing into an array, but your function requires fairly manual memory management anyway
- The malloc/strncpy combination could be changed to use strndup and be a lot simpler (I think strndup is standard?)
- You could actually use strtok inside of your function to avoid doing the index calculations
- Part of your code is recreating strtok
- The other part of your code is taking that strtok functionality and using it to push things into an array
- What you have now will (probably) perform better than strtok
I'm not familiar enough with standard practices of C to know what the expected way would be, but something about the the memory handling feels odd to me:
const char[] str = "Hello World";
int size = 0;
char** str_parts = my_split(str, ' ', &size);
if (str_parts != NULL) {
//Let's do something with the parts
for (int i = 0; i < size; ++i) {
printf("Part %d: '%s'\n", size, str_parts[i]);
}
//Now we have to clean up (could technically be in the printing loop)
for (int i = 0; i < size; ++i) {
free(str_parts[i]);
}
free(str_parts);
}I might consider abstracting the memory away to a generalized vector or a specialized string vector.
That could bring your API to something like:
const char[] str = "Hello World";
cvector parts;
//A vector of char*'s that will call free() on each element before destructing
cvector_init(&parts, sizeof(char*), free);
if (my_split(&parts, ' ')) {
for (size_t i = 0; i < parts.size; ++i) {
//I'm a bit rusty on C, so I'm not sure if the char* cast is needed or not
printf("Part %d: '%s'\n", size, (char*) cvector_get(&parts, i));
}
}
cvector_destory(&parts);The downside to this would be that you'd be tying your other code to a specific API (though bastardizing the vector into a
char would actually be fairly easy if you wrote the cvector API in a particular way -- though then you'd be depending on implementation details, which is usually a bad idea), and this vector would perform worse than a plain 2d array. (Though if you wrote a string specialized vector, the performance could be very similar and, depending on design, you could even use it as a char without worrying much about the implications.)Code Snippets
const char[] str = "Hello World";
int size = 0;
char** str_parts = my_split(str, ' ', &size);
if (str_parts != NULL) {
//Let's do something with the parts
for (int i = 0; i < size; ++i) {
printf("Part %d: '%s'\n", size, str_parts[i]);
}
//Now we have to clean up (could technically be in the printing loop)
for (int i = 0; i < size; ++i) {
free(str_parts[i]);
}
free(str_parts);
}const char[] str = "Hello World";
cvector parts;
//A vector of char*'s that will call free() on each element before destructing
cvector_init(&parts, sizeof(char*), free);
if (my_split(&parts, ' ')) {
for (size_t i = 0; i < parts.size; ++i) {
//I'm a bit rusty on C, so I'm not sure if the char* cast is needed or not
printf("Part %d: '%s'\n", size, (char*) cvector_get(&parts, i));
}
}
cvector_destory(&parts);Context
StackExchange Code Review Q#15055, answer score: 5
Revisions (0)
No revisions yet.