patterncMinor
Duplicate counter in C
Viewed 0 times
duplicatecounterstackoverflow
Problem
My goal was to create a function that could check for duplicates in an array no matter what the data type was, given that the size of the data type was provided. This was done by using the \$\mathcal{O}(n^2)\$ algorithm and casting to
The code has been confirmed to work with many data types.
As I'm quite new to the C world, I'd like for someone to point out anything wrong or dangerous in this function.
char because of its 1 byte property./**
* Returns count of duplicates in an array
* @param a array to check in
* @param items how many entries in array
* @param size size of each entry
* @return returns amount of duplicates/conflicts
*/
int dubCount(void *a, size_t items, size_t size){
int i,j;
int conflicts = 0;
char *x = (char *)a;
for (i = 0; i < items - 1; i++){
for (j = i + 1; j < items; j++){
/* Check if chunks are equal, if so count */
if(memcmp(&(x[i*size]), &(x[j*size]), size) == 0){
conflicts++;
}
}
}
return conflicts;
}The code has been confirmed to work with many data types.
As I'm quite new to the C world, I'd like for someone to point out anything wrong or dangerous in this function.
Solution
This is a very straightforward implementation which is great for maintainability and readability. Here are some things you could improve:
Not All Comparisons Work In Obvious Ways
As mentioned in the comments by Jerry Coffin and Loki Astari, you may run into issues with floating point values or values which may not be padded how you think, or for which the padding contains random values. You could solve this by also passing in a pointer to a comparison function and calling that function to do the comparison.
Naming
Your naming could be improved a little. The function is named
Use
In this function, none of the parameters to the function are modified. When your function doesn't modify a parameter, it should be marked as
Performance
You noted that the algorithm you're using is \$\mathcal{O}(n^2)\$. If you sorted the items in the array first, you could use an O(n) search on the result as you'd be able to simply run through the array until you hit a different value. (Note that this may depend on the statistical nature of your input data and the sort algorithm you use. But in general, I'd expect to see a speedup in such a case.) Most fast sort algorithms are \$\mathcal{O}(n log(n))\$ so the result would be \$\mathcal{O}(n + n log(n))\$ (I think? Check my math). Note that you would use the same comparison function for the sort as you would for checking for duplicates.
Not All Comparisons Work In Obvious Ways
As mentioned in the comments by Jerry Coffin and Loki Astari, you may run into issues with floating point values or values which may not be padded how you think, or for which the padding contains random values. You could solve this by also passing in a pointer to a comparison function and calling that function to do the comparison.
Naming
Your naming could be improved a little. The function is named
dubCount(), but from the comments and the name of the return value it should probably be conflictCount() or at least dupeCount(). It's not clear what dub refers to in dubCount() (at least to me). Also the parameter name a tells me nothing, and since it's a pointer to void, I have no contextual clue, either. I'd rename it to something like items and rename the items parameter to numberOfItems or numItems, and rename size to itemSize or sizePerItem or bytesPerItem so that it's more clear just from the function prototype what the parameters mean. Use
const where appropriateIn this function, none of the parameters to the function are modified. When your function doesn't modify a parameter, it should be marked as
const. This lets both the compiler and a reader know the function won't change those values. The compiler may be able to make some optimizations with that info, and a reader can do more reasoning about the function without reading its actual code, which is helpful.Performance
You noted that the algorithm you're using is \$\mathcal{O}(n^2)\$. If you sorted the items in the array first, you could use an O(n) search on the result as you'd be able to simply run through the array until you hit a different value. (Note that this may depend on the statistical nature of your input data and the sort algorithm you use. But in general, I'd expect to see a speedup in such a case.) Most fast sort algorithms are \$\mathcal{O}(n log(n))\$ so the result would be \$\mathcal{O}(n + n log(n))\$ (I think? Check my math). Note that you would use the same comparison function for the sort as you would for checking for duplicates.
Context
StackExchange Code Review Q#149602, answer score: 2
Revisions (0)
No revisions yet.