HiveBrain v1.2.0
Get Started
← Back to all entries
patterncppMinor

Keep elements present an odd number of times in an unsorted vector

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
numberpresentelementsunsortedkeeptimesvectorodd

Problem

From an unsorted vector of integers, I would like to keep only elements that are present an odd number of times and have them sorted. Below is an example of a working code.

#include
#include 

using namespace std;

int main()
{
        vector input = { 11, 3, 7, 3, 9, 3, 8, 11, 9, 3, 2, 9 };

        // print  input
        cout  output;
        int nbRepeats = 1;
        for (int i = 1 ; i < input.size() ; i++)
        {
                if (input[i-1] == input[i])  // if the ith element is a repeat
                {
                        nbRepeats++;
                } else
                {
                        if (nbRepeats % 2)  // Given that we have a new element, we will add the previous element to 'output' if the number of repeats of this previous element was odd
                        {
                                output.push_back(input[i-1]);
                        }
                        nbRepeats = 1;      // set nbRepeats back to 1 for the next series of elements.
                }
        }
        if (nbRepeats % 2)  // This is for the last series of elements
        {
                output.push_back(input.back());
        }

        // print the output
        cout << "Output:" << endl;
        for (int i = 0 ; i < output.size() ; i++)
        {
          cout << output[i] << " ";
        }
        cout << endl;
}


it rightly outputs

Input:
11 3 7 3 9 3 8 11 9 3 2 9 
Sorted Input:
2 3 3 3 3 7 8 9 9 9 11 11 
Output:
2 7 8 9


Question

The process is being a bit too slow for my needs. Is there alternative faster solution?

Typical input

Please note that a typical input is only between 1 and 10 elements long but cases of repeats will be quite common.

Potential ways to improve the code

I don't need to use the input, so I would be happy to directly modify the input rather than having to allocate memory an output vector if you think that would be faster.

It might eventually be more strategic to first remove elements

Solution

Since you asked for performance relevant code review I will answer these questions first and than will make some comments on the coding style.

The comments allready mentioned Profiling, so I will start with your original code and run it with CodeXL. Afterwards I will modify the code based on the profiling and will comment on possible performance improvements.

Profiling the Program

Since CodeXL works with samples wich is a statistical method (see https://en.wikipedia.org/wiki/Profiling_(computer_programming) ). For compilation I used MSVC2013 with the "-O2" flag for code optimization. I run your code in 10^5 times in for-loop. The result was this:

The Column with % of deep samples is an indicator of how much time the program spends in a function and its child function. Thus the main function is typically on the top.
Interesting are the third and fourth row which show that the "::_Reallocate 58.13% of deep samples

  • std::vector::_Insert 11.96% of deep samples



  • std::_Sort 10.01% of deep samples



Based on this I will give some comments about how to speed up the Code.

General possibilities for performance improvements

  • As shown above the vector::_Reallocate does take a lot of time in the code. This might be cause by the line output.push_back(input[i-1]); in your code. It will reallocate if the allocated memory of the vector needs to increase to store all elements. It will then allocate new memory and copy all its elements. This can be avoided by calling output.reserve(x); aft the construction of output. This will preallocate the memory for x elements. Since you optimize for speed and not for memory input.size() would be a good guess for the "x", because than you can be sure that output will not reallocate within the loop.



  • For your question of using the input vector for output and directly operating on it: I would guess that such an approach would include a lot of "erase" calls wich will also lead to a lot of copying, thus I do not think it would be faster.



  • Since you mentioned that your input will only have around 10 elements, I would not take to much time considering O-Notations because they are mainly interesting for large amounts of elements. In small dimensons the so oft forgotten constant in front of an O(...) will matter.



  • On the original code: The use of std::endl may have a big performace impact since it does not only produce a newline but it also flushes the putput. The use of "\n" or '\n' might be better.



  • Some times the compiler flags may have a big impact on performance. I does not have the knowledge for a detailed review, but I know of many cases were people look for code improvements because the program is slow but they compiled their program with debug configuration, so I thought it shopuld be mentioned.



Some small comments on coding style

  • Avoid "using namespace std" it is considered bad practice because it can lead to suprising collisions and ambiguity.



  • Prefer the prefix "++" to the postfix "++" unless you really want to increase a value and get the original value returned.



  • Avoid the using of std::endl, for the reasons stated above.



I hope this give a good overview on how to improve code for performance.

Context

StackExchange Code Review Q#146872, answer score: 3

Revisions (0)

No revisions yet.