patternpythonMinor
Functions and a GUI for entropy-related calculations
Viewed 0 times
relatedcalculationsentropyforandfunctionsgui
Problem
I wrote a script with utilities for calculating the entropy of iterables and included a Tk GUI that shows a quick overview over a text's properties in real-time. (on GitHub)
I tried to follow PEP 8 as good as possible, but I'm not sure about other things, specificially:
If you have any other concerns beside these questions, I'm open to criticism!
The code is split into two modules:
calc.py - Includes the calculation functions
```
"""Utilities for entropy-related calculations."""
from math import ceil as _ceil, log2 as _log2
def prob_to_info(probability):
"""Converts probability in the range from 0 to 1 into information measured
in bits, therefore using the dual logarithm. Returns None if the probability
is equal to zero."""
if probability == 0:
return None
elif probability == 1:
return 0
else:
return -_log2(probability)
def info_to_prob(information):
"""Converts information measured in bits to probablity."""
return 2**-information
def entropy(iterable):
"""Calculates the Shannon entropy of the given iterable."""
return sum(prob[1]*prob_to_info(prob[1]) for prob in char_mapping(iterable))
def optimal_bits(iterable):
"""Calculates the optimal usage of bits for decoding the iterable."""
return _ceil(entropy(iterable)) * len(iterable)
def metric_entropy(iterable):
"""Calculates the metric entropy of the iterable."""
return entropy(iterable) / len(iterable)
def char_mapping(iterable):
"""Creates a dictionary of the unique chararacters and their probability
in the given iterable."""
char_map = dict.fromkeys(set(iterable))
for char in set(iterable):
probability = iterable.count(char) / len
I tried to follow PEP 8 as good as possible, but I'm not sure about other things, specificially:
- I think my docstrings are sometimes overly redundant, see the GUI for example.
- In
gui.py, I'm not sure if I should move thecalculatemethod out of theGUIclass.
- Is the overall design good? I know it's a rather small project, but I want to do this correctly.
If you have any other concerns beside these questions, I'm open to criticism!
The code is split into two modules:
calc.py - Includes the calculation functions
```
"""Utilities for entropy-related calculations."""
from math import ceil as _ceil, log2 as _log2
def prob_to_info(probability):
"""Converts probability in the range from 0 to 1 into information measured
in bits, therefore using the dual logarithm. Returns None if the probability
is equal to zero."""
if probability == 0:
return None
elif probability == 1:
return 0
else:
return -_log2(probability)
def info_to_prob(information):
"""Converts information measured in bits to probablity."""
return 2**-information
def entropy(iterable):
"""Calculates the Shannon entropy of the given iterable."""
return sum(prob[1]*prob_to_info(prob[1]) for prob in char_mapping(iterable))
def optimal_bits(iterable):
"""Calculates the optimal usage of bits for decoding the iterable."""
return _ceil(entropy(iterable)) * len(iterable)
def metric_entropy(iterable):
"""Calculates the metric entropy of the iterable."""
return entropy(iterable) / len(iterable)
def char_mapping(iterable):
"""Creates a dictionary of the unique chararacters and their probability
in the given iterable."""
char_map = dict.fromkeys(set(iterable))
for char in set(iterable):
probability = iterable.count(char) / len
Solution
You ask about docstrings, so you should be aware that there is a PEP for those, too. In particular, note that:
Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.
The style guide specifies that docstring lines should be a maximum of 72 characters; a few of yours exceed this. There are various formats that you can adopt to include information in the docstrings in a structured way for use by documentation generators and other tools; I like the Google style.
For example,
could be more like:
I assume that you've aliased
It seems a bit odd to have the class that occupies pretty much the whole of
you could make the
and run it directly:
This is trivial enough to include under
Rather than the string concatenation with
Given what this method does, I don't think that
As currently implemented, the code breaks (due to
Multi-line docstrings consist of a summary line just like a one-line docstring, followed by a blank line, followed by a more elaborate description.
The style guide specifies that docstring lines should be a maximum of 72 characters; a few of yours exceed this. There are various formats that you can adopt to include information in the docstrings in a structured way for use by documentation generators and other tools; I like the Google style.
For example,
"""Converts probability in the range from 0 to 1 into information measured
in bits, therefore using the dual logarithm. Returns None if the probability
is equal to zero."""could be more like:
"""Converts probability into information, measured in bits.
Notes:
Uses the dual logarithm.
Args:
probability (float): In the range from 0 to 1.
Returns:
float [or None if the probability is equal to zero].
"""I assume that you've aliased
log2 and ceil to _log2 and _ceil respectively to avoid them being imported into gui. Instead, you can use __all__ to specify what should be available to modules that import from calc (see the tutorial):__all__ = [
'entropy',
'info_to_prob',
'metric_entropy',
'optimal_bits',
'prob_to_info',
]It seems a bit odd to have the class that occupies pretty much the whole of
gui.py be explicitly ignored after instantiation! Rather than having:root = tk.Tk()
_ = GUI(root)
root.mainloop()you could make the
GUI class inherit from tk.Tk:class GUI(tk.Tk):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.state("zoomed")
self.frame = tk.Frame(self)
...and run it directly:
root = GUI()
root.mainloop()This is trivial enough to include under
if __name__ == '__main__': directly, rather than via main. There's also no need for the , *_ in GUI.calculate.Rather than the string concatenation with
+, I would use str.format, for example:table_head = " Char | Probability | Bits | Occurrences "
table_body = "\n".join(
[
" {:11.7f} | {:>11.7f} | {:>11}".format(
char,
prob,
calc.prob_to_info(prob),
text.count(char)
)
for char, prob in char_map
]
)Given what this method does, I don't think that
calculate is an appropriate name for it. You could split the calculations and the formatting into two methods, with more appropriate names.As currently implemented, the code breaks (due to
ZeroDivisionError in metric_entropy) if you toggle Ignore Case before entering any text, or if you delete all of the input text. You should handle this error, and display something sensible in these cases.Code Snippets
"""Converts probability in the range from 0 to 1 into information measured
in bits, therefore using the dual logarithm. Returns None if the probability
is equal to zero.""""""Converts probability into information, measured in bits.
Notes:
Uses the dual logarithm.
Args:
probability (float): In the range from 0 to 1.
Returns:
float [or None if the probability is equal to zero].
"""__all__ = [
'entropy',
'info_to_prob',
'metric_entropy',
'optimal_bits',
'prob_to_info',
]root = tk.Tk()
_ = GUI(root)
root.mainloop()class GUI(tk.Tk):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.state("zoomed")
self.frame = tk.Frame(self)
...Context
StackExchange Code Review Q#85879, answer score: 6
Revisions (0)
No revisions yet.