HiveBrain v1.2.0
Get Started
← Back to all entries
snippetcppMinor

Sort Characters By Frequency

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
characterssortfrequency

Problem

I solved this problem.


Problem Statement


Given a string, sort it in decreasing order based on the frequency of
characters.


Example 1


Input: "tree"


Output: "eert"


Explanation: 'e' appears twice while 'r' and 't' both appear once. So 'e' must appear before both 'r' and 't'. Therefore "eetr" is also a
valid answer.


Example 2:


Input: "cccaaa"


Output: "cccaaa"


Explanation: Both 'c' and 'a' appear three times, so "aaaccc" is also
a valid answer. Note that "cacaca" is incorrect, as the same
characters must be together. Example 3:


Input: "Aabb"


Output: "bbAa"


Explanation: "bbaA" is also a valid answer, but "Aabb" is incorrect.
Note that 'A' and 'a' are treated as two different characters.

I solved this problem using heaps(since it was tagged under this category). I feel my solution is a bit verbose which includes building a vector from a map and so on.

Can you think of any redundancies that can be removed from this solution or any better way of doing it.

class CompareHeap
    {
        public:
            bool operator()(const std::pair &A, const std::pair &B )
            {
                return (A.second  freq;
        for(auto it = input.begin(); it!= input.end(); ++it)
        {
            if(freq.find(*it) != freq.end())
            {
                freq[*it] += 1;
            }
            else
            {
                freq[*it] = 1;
            }
        }
        std::vector> v;
        std::transform(freq.begin(), freq.end(), std::back_inserter(v), [](std::pair kv) {return kv;});
        std::make_heap(v.begin(), v.end(), CompareHeap());
        while(v.size() > 0)
        {
            std::pop_heap(v.begin(), v.end(), CompareHeap());
            answer += std::string(v.back().second,v.back().first);
            v.pop_back();
        }
        return answer;
        }
    };

Solution

Shorter:

There are possibilities to make the code shorted without making it less correct/clear.

for(auto it = input.begin(); it!= input.end(); ++it)


Range loop could be used for this purpose.

if(freq.find(*it) != freq.end())
        {
            freq[*it] += 1;
        }
        else
        {
            freq[*it] = 1;
        }


map zero initializes built in types when it creates the key-value pair. Source. So the check can be removed, and ++ used instead of +1. The change may also improve performance, but I don't think it will be much of a deal.

std::vector> v;
    std::transform(freq.begin(), freq.end(), std::back_inserter(v), [](std::pair kv) {return kv;});


std::vector has a constructor which takes iterator pair and constructs the container from that.

std::make_heap(v.begin(), v.end(), CompareHeap());
    while(v.size() > 0)
    {
        std::pop_heap(v.begin(), v.end(), CompareHeap());
        answer += std::string(v.back().second,v.back().first);
        v.pop_back();
    }


In my opinion sort would be a better fit (making a heap is sorting too, but I think std::sort have potential to perform better), then do range for loop with the last 2 lines of the code inside of it. Currently intentions are less clear, though still good.

Style:

I think that having operator() in the outside class is confusing. If the clarity is really wanted, named lambda could be made, then passed to std::sort() or make_heap().

Theoretically input.length() can be bigger than maximum value of int, so std::size_t should be used, just to be nerd that cares about standard compliance.

Usage of map:

I think that map is the most clear solution for this problem. It lifts a lot of burden from the programmer. There are also possibilities that compiler vendor did some small map optimization (so everything will be in the automatic storage).

Copying the map into vector will probably cost nothing, since everything is POD, so it will decay into std::memset() or something similar (in case the map is small map, it will be even faster).

Sorting a vector should be fast as well, since char on the platform has theoretical maximum of 256 values.

The only thing that could be slightly sped up is putting the data into the result. reserve(input.length()) can be called on the result string so that it won't perform any additional memory allocations.

Performance:

The algorithm should be fast enough for most of the systems, unless insane blazing performance is needed (e.g. Facebook or similar).

Code Snippets

for(auto it = input.begin(); it!= input.end(); ++it)
if(freq.find(*it) != freq.end())
        {
            freq[*it] += 1;
        }
        else
        {
            freq[*it] = 1;
        }
std::vector<std::pair<char, int >> v;
    std::transform(freq.begin(), freq.end(), std::back_inserter(v), [](std::pair<char , int > kv) {return kv;});
std::make_heap(v.begin(), v.end(), CompareHeap());
    while(v.size() > 0)
    {
        std::pop_heap(v.begin(), v.end(), CompareHeap());
        answer += std::string(v.back().second,v.back().first);
        v.pop_back();
    }

Context

StackExchange Code Review Q#151972, answer score: 4

Revisions (0)

No revisions yet.