HiveBrain v1.2.0
Get Started
← Back to all entries
patterncsharpCritical

Calculating entropy of a string

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
calculatingentropystring

Problem

We're calculating entropy of a string a few places in Stack Overflow as a signifier of low quality.

I whipped up this simple method which counts unique characters in a string, but it is quite literally the first thing that popped into my head. It's the "dumbest thing that works".

/// 
/// returns the # of unique characters in a string as a rough 
/// measurement of entropy
/// 
public static int Entropy(this string s)
{
  var d = new Dictionary();
  foreach (char c in s)
      if (!d.ContainsKey(c)) d.Add(c, true);
  return d.Count();
}


Is there a better / more elegant / more accurate way to calculate the entropy of a string?

Efficiency is also good, though we never call this on large strings so it is not a huge concern.

Solution

I also came up with this, based on Shannon entropy.


In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits.

It is a more "formal" calculation of entropy than simply counting letters:

/// 
/// returns bits of entropy represented in a given string, per 
/// http://en.wikipedia.org/wiki/Entropy_(information_theory) 
/// 
public static double ShannonEntropy(string s)
{
    var map = new Dictionary();
    foreach (char c in s)
    {
        if (!map.ContainsKey(c))
            map.Add(c, 1);
        else
            map[c] += 1;
    }

    double result = 0.0;
    int len = s.Length;
    foreach (var item in map)
    {
        var frequency = (double)item.Value / len;
        result -= frequency * (Math.Log(frequency) / Math.Log(2));
    }

    return result;
}


Results are:

"abcdefghijklmnop" = 4.00
"Hello, World!" = 3.18
"hello world" = 2.85
"123123123123" = 1.58
"aaaa" = 0

Code Snippets

/// <summary>
/// returns bits of entropy represented in a given string, per 
/// http://en.wikipedia.org/wiki/Entropy_(information_theory) 
/// </summary>
public static double ShannonEntropy(string s)
{
    var map = new Dictionary<char, int>();
    foreach (char c in s)
    {
        if (!map.ContainsKey(c))
            map.Add(c, 1);
        else
            map[c] += 1;
    }

    double result = 0.0;
    int len = s.Length;
    foreach (var item in map)
    {
        var frequency = (double)item.Value / len;
        result -= frequency * (Math.Log(frequency) / Math.Log(2));
    }

    return result;
}

Context

StackExchange Code Review Q#868, answer score: 97

Revisions (0)

No revisions yet.